use strict;
use warnings;
while(<DATA>){
    chomp;
    if ($_ =~ /NM_(\d+)/){
        my $found = $1;
        $_ =~ s/$found/$found:12345/g;
        print "$_\n";
    } else {
        print "$_\n";
    }
}

__DATA__
chr1    ucsc    exon    226488874       226488906       0.000000
-       .       gene_id "NM_173083"; transcript_id "NM_173083";
chr1    ucsc    exon    226496810       226497198       0.000000
-       .       gene_id "NM_173083"; transcript_id "NM_173083";
chr1    ucsc    exon    2005086 2005368 0.000000        +       .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1    ucsc    exon    2066701 2066786 0.000000        +       .
gene_id "NM_001033581"; transcript_id "NM_001033581";

~Parag



On Sat, Feb 26, 2011 at 12:06 PM, Richard Green <gree...@uw.edu> wrote:

> Hi Perl users, Quick question, I have a one long string with tab delimited
> values separated by a newline character (in rows)
> Here is a snippet of the the string:
>
> chr1    ucsc    exon    226488874       226488906       0.000000
> -       .       gene_id "NM_173083"; transcript_id "NM_173083";
> chr1    ucsc    exon    226496810       226497198       0.000000
> -       .       gene_id "NM_173083"; transcript_id "NM_173083";
> chr1    ucsc    exon    2005086 2005368 0.000000        +       .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
> chr1    ucsc    exon    2066701 2066786 0.000000        +       .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
>
> I am trying to perform substitution on some values at the end of each rows,
> for example, I'm trying to replace the above string with the following:
>
> chr1    ucsc    exon    226488874       226488906       0.000000
> -       .       gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1    ucsc    exon    226496810       226497198       0.000000
> -       .       gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1    ucsc    exon    2005086 2005368 0.000000        +       .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
> chr1    ucsc    exon    2066701 2066786 0.000000        +       .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>
> Here is the substitution command I am trying to use:
>
> $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
> \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
>
> $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
> \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
>
> I don't know why I am not able to substitute at the end of each row in the
> string.
> Any suggestions folks have are muchly appreciated. Thanks -Rich
>

Reply via email to