RE: string substitution command question

2011-02-26 Thread Katya Gorodinsky
Hi,

What about this solution:

use warnings;
use strict;

my $str = ' chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";';

my @patterns = map {/(NM_\d+)"/; $1} grep(/NM_\d+"/, split(/\n+/, $str));
my $additional = 12345;
foreach (@patterns) {
$str =~ s/($_)\"/$1:$additional\"/g and $additional++;
}
print "$str\n";


Regards,
Katya

-Original Message-
From: Richard Green [mailto:gree...@uw.edu] 
Sent: Saturday, February 26, 2011 10:07 PM
To: beginners@perl.org
Subject: string substitution command question

Hi Perl users, Quick question, I have a one long string with tab delimited
values separated by a newline character (in rows)
Here is a snippet of the the string:

chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";

I am trying to perform substitution on some values at the end of each rows,
for example, I'm trying to replace the above string with the following:

chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";

Here is the substitution command I am trying to use:

$data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
\"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;

$data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
\"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;

I don't know why I am not able to substitute at the end of each row in the
string.
Any suggestions folks have are muchly appreciated. Thanks -Rich

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread Richard Green
Ok JD thanks 




On Feb 26, 2011, at 3:46 PM, John Delacour  wrote:

> At 12:57 -0800 26/02/2011, Richard Green wrote:
> 
> 
>> > What is $gene_id? 
>>> Are you by any chance using '$' at the beginning of your search pattern 
>>> instead of the end?
>> I have $ to designate the end of the row
>> $gene_id
> 
> $gene_id designates $gene_id period.
> 
>> > Why are you escaping the quote marks?
>> I thought it would be easier to perform substitution without them
> 
> What made you think that?
> 
>> > Why is there no space after 'gene_id'?
>> I guess there should be
> 
> You can guess as much as you like but Perl Regular Expressions don't care 
> what you think or what you guess.  Read perlvar and pelretut.
> 
> JD
> 
> 
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread John Delacour

At 12:57 -0800 26/02/2011, Richard Green wrote:


 > What is $gene_id? 
 Are you by any chance using '$' at the beginning of your search 
pattern instead of the end?

I have $ to designate the end of the row
$gene_id


$gene_id designates $gene_id period.


 > Why are you escaping the quote marks?
I thought it would be easier to perform substitution without them


What made you think that?


 > Why is there no space after 'gene_id'?
I guess there should be


You can guess as much as you like but Perl Regular Expressions don't 
care what you think or what you guess.  Read perlvar and pelretut.


JD



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread Parag Kalra
On Sat, Feb 26, 2011 at 12:56 PM, Uri Guttman  wrote:

> > "PK" == Parag Kalra  writes:
>
>   >> why are you doing s/// against $_? by default it does that.
>
> you didn't rectify this one.
>

Oops. Missed that.


>
>
>  PK> Sorry. Hope this reply is better and so as the following code:
>
> much better.
>

Thanks.


>
>  PK> use strict;
>  PK> use warnings;
>  PK> while(){
>   PK> $_ =~ s/NM_(\d+)/$1:12345/g;
>
> i didn't follow the request carefully. that is dropping the NM_ part.
>

Good catch.

use strict;
use warnings;
while(){
s/NM_(\d+)/NM_$1:12345/g;
print;
}

__DATA__
chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";



>
> uri
>
>
Thanks once again.

~Parag


>  --
> Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com--
> -  Perl Code Review , Architecture, Development, Training, Support
> --
> -  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com-
>


Re: string substitution command question

2011-02-26 Thread Richard Green

> What is $gene_id?  
> Are you by any chance using '$' at the beginning of your search pattern 
> instead of the end?
I have $ to designate the end of the row
$gene_id 
> 
> Why are you escaping the quote marks?
I thought it would be easier to perform substitution without them
> 
> Why is there no space after 'gene_id'?
I guess there should be





On Feb 26, 2011, at 12:30 PM, John Delacour  wrote:

> At 12:06 -0800 26/02/2011, Richard Green wrote:
> 
>> chr1ucscexon226488874   226488906   0.00
>> -   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
>> chr1ucscexon226496810   226497198   0.00
>> -   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
>> chr1ucscexon2005086 2005368 0.00+   .
>> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>> chr1ucscexon2066701 2066786 0.00+   .
>> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>> 
>> Here is the substitution command I am trying to use:
>> 
>> $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
>> \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
>> 
>> $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
>> \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
>> 
>> I don't know why I am not able to substitute at the end of each row in the
>> string.
> 
> What is $gene_id?  Are you by any chance using '$' at the beginning of your 
> search pattern instead of the end?
> 
> Why are you escaping the quote marks?
> 
> Why is there no space after 'gene_id'?
> 
> JD
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread Uri Guttman
> "PK" == Parag Kalra  writes:

  >> why are you doing s/// against $_? by default it does that.

you didn't rectify this one.


  PK> Sorry. Hope this reply is better and so as the following code:

much better.

  PK> use strict;
  PK> use warnings;
  PK> while(){
  PK> $_ =~ s/NM_(\d+)/$1:12345/g;

i didn't follow the request carefully. that is dropping the NM_ part.

uri

-- 
Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com --
-  Perl Code Review , Architecture, Development, Training, Support --
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread Uri Guttman

-- 
Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com --
-  Perl Code Review , Architecture, Development, Training, Support --
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread Parag Kalra
On Sat, Feb 26, 2011 at 12:34 PM, Uri Guttman  wrote:

> > "PK" == Parag Kalra  writes:
>
>  PK> use strict;
>  PK> use warnings;
>  PK> while(){
>  PK> chomp;
>
> why are you chomping here when you add in the \n later?
>

 Agreed and corrected in the example at the bottom.


>  PK> if ($_ =~ /NM_(\d+)/){
>  PK> my $found = $1;
>  PK> $_ =~ s/$found/$found:12345/g;
>
> many issues there. why do you test the match before making the s///? you
> can ALWAYS do an s/// as it will just fail if it doesn't match.
>

 Rectified in the example at the bottom.


>
> why are you doing s/// against $_? by default it does that.
>
>  PK> print "$_\n";
>  PK> } else {
>  PK> print "$_\n";
>  PK> }
>
> why are you printing the same thing in each clause? just print AFTER the
> change is made?
>

Big mistake. I accept it. Modified in the example at the bottom.


>
>
> why do you top post when you have been told to bottom post and edit the
> quoted email?
>

Sorry. Hope this reply is better and so as the following code:

use strict;
use warnings;
while(){
$_ =~ s/NM_(\d+)/$1:12345/g;
print;
}

__DATA__
chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";



>
> uri
>


Thanks for the review

~Parag


>
> --
> Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com--
> -  Perl Code Review , Architecture, Development, Training, Support
> --
> -  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com-
>


Re: string substitution command question

2011-02-26 Thread Uri Guttman
> "PK" == Parag Kalra  writes:

  PK> use strict;
  PK> use warnings;
  PK> while(){
  PK> chomp;

why are you chomping here when you add in the \n later?

  PK> if ($_ =~ /NM_(\d+)/){
  PK> my $found = $1;
  PK> $_ =~ s/$found/$found:12345/g;

many issues there. why do you test the match before making the s///? you
can ALWAYS do an s/// as it will just fail if it doesn't match.

why are you doing s/// against $_? by default it does that.

  PK> print "$_\n";
  PK> } else {
  PK> print "$_\n";
  PK> }

why are you printing the same thing in each clause? just print AFTER the
change is made?


why do you top post when you have been told to bottom post and edit the
quoted email?

uri

-- 
Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com --
-  Perl Code Review , Architecture, Development, Training, Support --
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread John Delacour

At 12:06 -0800 26/02/2011, Richard Green wrote:


chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";

Here is the substitution command I am trying to use:

$data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
\"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;

$data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
\"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;

I don't know why I am not able to substitute at the end of each row in the
string.


What is $gene_id?  Are you by any chance using '$' at the beginning 
of your search pattern instead of the end?


Why are you escaping the quote marks?

Why is there no space after 'gene_id'?

JD

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: string substitution command question

2011-02-26 Thread Parag Kalra
use strict;
use warnings;
while(){
chomp;
if ($_ =~ /NM_(\d+)/){
my $found = $1;
$_ =~ s/$found/$found:12345/g;
print "$_\n";
} else {
print "$_\n";
}
}

__DATA__
chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";

~Parag



On Sat, Feb 26, 2011 at 12:06 PM, Richard Green  wrote:

> Hi Perl users, Quick question, I have a one long string with tab delimited
> values separated by a newline character (in rows)
> Here is a snippet of the the string:
>
> chr1ucscexon226488874   226488906   0.00
> -   .   gene_id "NM_173083"; transcript_id "NM_173083";
> chr1ucscexon226496810   226497198   0.00
> -   .   gene_id "NM_173083"; transcript_id "NM_173083";
> chr1ucscexon2005086 2005368 0.00+   .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
> chr1ucscexon2066701 2066786 0.00+   .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
>
> I am trying to perform substitution on some values at the end of each rows,
> for example, I'm trying to replace the above string with the following:
>
> chr1ucscexon226488874   226488906   0.00
> -   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1ucscexon226496810   226497198   0.00
> -   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1ucscexon2005086 2005368 0.00+   .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
> chr1ucscexon2066701 2066786 0.00+   .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>
> Here is the substitution command I am trying to use:
>
> $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
> \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
>
> $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
> \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
>
> I don't know why I am not able to substitute at the end of each row in the
> string.
> Any suggestions folks have are muchly appreciated. Thanks -Rich
>


string substitution command question

2011-02-26 Thread Richard Green
Hi Perl users, Quick question, I have a one long string with tab delimited
values separated by a newline character (in rows)
Here is a snippet of the the string:

chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083"; transcript_id "NM_173083";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581"; transcript_id "NM_001033581";

I am trying to perform substitution on some values at the end of each rows,
for example, I'm trying to replace the above string with the following:

chr1ucscexon226488874   226488906   0.00
-   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1ucscexon226496810   226497198   0.00
-   .   gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1ucscexon2005086 2005368 0.00+   .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
chr1ucscexon2066701 2066786 0.00+   .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";

Here is the substitution command I am trying to use:

$data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
\"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;

$data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
\"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;

I don't know why I am not able to substitute at the end of each row in the
string.
Any suggestions folks have are muchly appreciated. Thanks -Rich