use strict;
use warnings;
while(<DATA>){
chomp;
if ($_ =~ /NM_(\d+)/){
my $found = $1;
$_ =~ s/$found/$found:12345/g;
print "$_\n";
} else {
print "$_\n";
}
}
__DATA__
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
~Parag
On Sat, Feb 26, 2011 at 12:06 PM, Richard Green <[email protected]> wrote:
> Hi Perl users, Quick question, I have a one long string with tab delimited
> values separated by a newline character (in rows)
> Here is a snippet of the the string:
>
> chr1 ucsc exon 226488874 226488906 0.000000
> - . gene_id "NM_173083"; transcript_id "NM_173083";
> chr1 ucsc exon 226496810 226497198 0.000000
> - . gene_id "NM_173083"; transcript_id "NM_173083";
> chr1 ucsc exon 2005086 2005368 0.000000 + .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
> chr1 ucsc exon 2066701 2066786 0.000000 + .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
>
> I am trying to perform substitution on some values at the end of each rows,
> for example, I'm trying to replace the above string with the following:
>
> chr1 ucsc exon 226488874 226488906 0.000000
> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1 ucsc exon 226496810 226497198 0.000000
> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1 ucsc exon 2005086 2005368 0.000000 + .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
> chr1 ucsc exon 2066701 2066786 0.000000 + .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>
> Here is the substitution command I am trying to use:
>
> $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
> \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
>
> $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
> \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
>
> I don't know why I am not able to substitute at the end of each row in the
> string.
> Any suggestions folks have are muchly appreciated. Thanks -Rich
>