RE: string substitution command question
Hi, What about this solution: use warnings; use strict; my $str = ' chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581";'; my @patterns = map {/(NM_\d+)"/; $1} grep(/NM_\d+"/, split(/\n+/, $str)); my $additional = 12345; foreach (@patterns) { $str =~ s/($_)\"/$1:$additional\"/g and $additional++; } print "$str\n"; Regards, Katya -Original Message- From: Richard Green [mailto:gree...@uw.edu] Sent: Saturday, February 26, 2011 10:07 PM To: beginners@perl.org Subject: string substitution command question Hi Perl users, Quick question, I have a one long string with tab delimited values separated by a newline character (in rows) Here is a snippet of the the string: chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; I am trying to perform substitution on some values at the end of each rows, for example, I'm trying to replace the above string with the following: chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; Here is the substitution command I am trying to use: $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g; $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g; I don't know why I am not able to substitute at the end of each row in the string. Any suggestions folks have are muchly appreciated. Thanks -Rich -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
Ok JD thanks On Feb 26, 2011, at 3:46 PM, John Delacour wrote: > At 12:57 -0800 26/02/2011, Richard Green wrote: > > >> > What is $gene_id? >>> Are you by any chance using '$' at the beginning of your search pattern >>> instead of the end? >> I have $ to designate the end of the row >> $gene_id > > $gene_id designates $gene_id period. > >> > Why are you escaping the quote marks? >> I thought it would be easier to perform substitution without them > > What made you think that? > >> > Why is there no space after 'gene_id'? >> I guess there should be > > You can guess as much as you like but Perl Regular Expressions don't care > what you think or what you guess. Read perlvar and pelretut. > > JD > > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
At 12:57 -0800 26/02/2011, Richard Green wrote: > What is $gene_id? Are you by any chance using '$' at the beginning of your search pattern instead of the end? I have $ to designate the end of the row $gene_id $gene_id designates $gene_id period. > Why are you escaping the quote marks? I thought it would be easier to perform substitution without them What made you think that? > Why is there no space after 'gene_id'? I guess there should be You can guess as much as you like but Perl Regular Expressions don't care what you think or what you guess. Read perlvar and pelretut. JD -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
On Sat, Feb 26, 2011 at 12:56 PM, Uri Guttman wrote: > > "PK" == Parag Kalra writes: > > >> why are you doing s/// against $_? by default it does that. > > you didn't rectify this one. > Oops. Missed that. > > > PK> Sorry. Hope this reply is better and so as the following code: > > much better. > Thanks. > > PK> use strict; > PK> use warnings; > PK> while(){ > PK> $_ =~ s/NM_(\d+)/$1:12345/g; > > i didn't follow the request carefully. that is dropping the NM_ part. > Good catch. use strict; use warnings; while(){ s/NM_(\d+)/NM_$1:12345/g; print; } __DATA__ chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; > > uri > > Thanks once again. ~Parag > -- > Uri Guttman -- u...@stemsystems.com http://www.sysarch.com-- > - Perl Code Review , Architecture, Development, Training, Support > -- > - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com- >
Re: string substitution command question
> What is $gene_id? > Are you by any chance using '$' at the beginning of your search pattern > instead of the end? I have $ to designate the end of the row $gene_id > > Why are you escaping the quote marks? I thought it would be easier to perform substitution without them > > Why is there no space after 'gene_id'? I guess there should be On Feb 26, 2011, at 12:30 PM, John Delacour wrote: > At 12:06 -0800 26/02/2011, Richard Green wrote: > >> chr1ucscexon226488874 226488906 0.00 >> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; >> chr1ucscexon226496810 226497198 0.00 >> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; >> chr1ucscexon2005086 2005368 0.00+ . >> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; >> chr1ucscexon2066701 2066786 0.00+ . >> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; >> >> Here is the substitution command I am trying to use: >> >> $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id >> \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g; >> >> $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id >> \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g; >> >> I don't know why I am not able to substitute at the end of each row in the >> string. > > What is $gene_id? Are you by any chance using '$' at the beginning of your > search pattern instead of the end? > > Why are you escaping the quote marks? > > Why is there no space after 'gene_id'? > > JD > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
> "PK" == Parag Kalra writes: >> why are you doing s/// against $_? by default it does that. you didn't rectify this one. PK> Sorry. Hope this reply is better and so as the following code: much better. PK> use strict; PK> use warnings; PK> while(){ PK> $_ =~ s/NM_(\d+)/$1:12345/g; i didn't follow the request carefully. that is dropping the NM_ part. uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
-- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
On Sat, Feb 26, 2011 at 12:34 PM, Uri Guttman wrote: > > "PK" == Parag Kalra writes: > > PK> use strict; > PK> use warnings; > PK> while(){ > PK> chomp; > > why are you chomping here when you add in the \n later? > Agreed and corrected in the example at the bottom. > PK> if ($_ =~ /NM_(\d+)/){ > PK> my $found = $1; > PK> $_ =~ s/$found/$found:12345/g; > > many issues there. why do you test the match before making the s///? you > can ALWAYS do an s/// as it will just fail if it doesn't match. > Rectified in the example at the bottom. > > why are you doing s/// against $_? by default it does that. > > PK> print "$_\n"; > PK> } else { > PK> print "$_\n"; > PK> } > > why are you printing the same thing in each clause? just print AFTER the > change is made? > Big mistake. I accept it. Modified in the example at the bottom. > > > why do you top post when you have been told to bottom post and edit the > quoted email? > Sorry. Hope this reply is better and so as the following code: use strict; use warnings; while(){ $_ =~ s/NM_(\d+)/$1:12345/g; print; } __DATA__ chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; > > uri > Thanks for the review ~Parag > > -- > Uri Guttman -- u...@stemsystems.com http://www.sysarch.com-- > - Perl Code Review , Architecture, Development, Training, Support > -- > - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com- >
Re: string substitution command question
> "PK" == Parag Kalra writes: PK> use strict; PK> use warnings; PK> while(){ PK> chomp; why are you chomping here when you add in the \n later? PK> if ($_ =~ /NM_(\d+)/){ PK> my $found = $1; PK> $_ =~ s/$found/$found:12345/g; many issues there. why do you test the match before making the s///? you can ALWAYS do an s/// as it will just fail if it doesn't match. why are you doing s/// against $_? by default it does that. PK> print "$_\n"; PK> } else { PK> print "$_\n"; PK> } why are you printing the same thing in each clause? just print AFTER the change is made? why do you top post when you have been told to bottom post and edit the quoted email? uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
At 12:06 -0800 26/02/2011, Richard Green wrote: chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; Here is the substitution command I am trying to use: $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g; $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g; I don't know why I am not able to substitute at the end of each row in the string. What is $gene_id? Are you by any chance using '$' at the beginning of your search pattern instead of the end? Why are you escaping the quote marks? Why is there no space after 'gene_id'? JD -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: string substitution command question
use strict; use warnings; while(){ chomp; if ($_ =~ /NM_(\d+)/){ my $found = $1; $_ =~ s/$found/$found:12345/g; print "$_\n"; } else { print "$_\n"; } } __DATA__ chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; ~Parag On Sat, Feb 26, 2011 at 12:06 PM, Richard Green wrote: > Hi Perl users, Quick question, I have a one long string with tab delimited > values separated by a newline character (in rows) > Here is a snippet of the the string: > > chr1ucscexon226488874 226488906 0.00 > - . gene_id "NM_173083"; transcript_id "NM_173083"; > chr1ucscexon226496810 226497198 0.00 > - . gene_id "NM_173083"; transcript_id "NM_173083"; > chr1ucscexon2005086 2005368 0.00+ . > gene_id "NM_001033581"; transcript_id "NM_001033581"; > chr1ucscexon2066701 2066786 0.00+ . > gene_id "NM_001033581"; transcript_id "NM_001033581"; > > I am trying to perform substitution on some values at the end of each rows, > for example, I'm trying to replace the above string with the following: > > chr1ucscexon226488874 226488906 0.00 > - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; > chr1ucscexon226496810 226497198 0.00 > - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; > chr1ucscexon2005086 2005368 0.00+ . > gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; > chr1ucscexon2066701 2066786 0.00+ . > gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; > > Here is the substitution command I am trying to use: > > $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id > \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g; > > $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id > \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g; > > I don't know why I am not able to substitute at the end of each row in the > string. > Any suggestions folks have are muchly appreciated. Thanks -Rich >
string substitution command question
Hi Perl users, Quick question, I have a one long string with tab delimited values separated by a newline character (in rows) Here is a snippet of the the string: chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083"; transcript_id "NM_173083"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581"; transcript_id "NM_001033581"; I am trying to perform substitution on some values at the end of each rows, for example, I'm trying to replace the above string with the following: chr1ucscexon226488874 226488906 0.00 - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; chr1ucscexon226496810 226497198 0.00 - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345"; chr1ucscexon2005086 2005368 0.00+ . gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; chr1ucscexon2066701 2066786 0.00+ . gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346"; Here is the substitution command I am trying to use: $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g; $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g; I don't know why I am not able to substitute at the end of each row in the string. Any suggestions folks have are muchly appreciated. Thanks -Rich