Pedro Antonio Reche wrote:
>
> Hi, I am interested in parsing the file at the bottom of this e-mail in
> order to extract the string between "" following /product=,
> /protein_id=, /db_xref= and /translation=, and that for each of the
> segment separated by the string "CDS". The ouptput for the example
> bellow should look like this:
>
> >V001|AAM13451.1|GI:20152990
> MESLKYFYSLSLSLFNGLTKILNLFLMESLKYFYSLSLSLFNGL
> TKILNLFLMVSIKRSIFLTL
> >V002|AAA60951.1|GI:333518
> KQIVLACICLAAVAIPTSLQQSFSSSSSCTEEENKHHMGIDVI
> IKVTKQDQTPTNDKICQSVTEVTESEDESEEVVKGDPTTYYTVVGGGLTMDFGFTKCP
> KISSISEYSDGNTVNARLSSVSPGQGKDSPAITREEALSMIKDCEMSINIKCSEEEKD
> SNIKTHPVLGSNISHKKVSYEDIIGSTIVDTKCVKNLEISVRIGDMCKESSELEVKDG
> FKYVDGSASEDAADDTSLINSAKLIACV
>
> So far I have use the code below which actually work. However, I am not
> please with it, as it generates an empty element in the hash from the
> header of the file and becasue that there might be a better way to do
> this. Thereby, I will be very pleased for any input or alternative way
> to improve the code.
> Regards,
> pedro
> #!/usr/sbin/perl -w
> $/ = "\n CDS";
> while(<>){
> $_ =~ /product=\"(.+)\"/;
> $gname = $1;
> $gname =~ s/\s+//g;
> push @ID, $gname;
> $_ =~ /protein_id="([\w\.]+)\"/;
> $ref = $1;
> $_=~ /db_xref=\"GI:(\w+)\"/;
> $gid = $1;
> $_ =~ /translation=\"([A-Z\s]+)/;
> $seq = $1;
> $seq =~ s/\s+//g;
> $hash{$gname} = ["$ref", "$gid", "$seq"];
> }
> open(F, ">test");
> foreach $key (@ID){
> print F ">gi|$hash{$key}[1]|$hash{$key}[0]
> $key\n$hash{$key}[2]\n";
> }
> close(F);
>
> [snip]
You probably want something like this:
#!/usr/sbin/perl -w
use strict;
$/ = "\n CDS";
open F, '>test' or die "Cannot open 'test' $!";
while ( <> ) {
my ($gname) = /product="([^"]+)"/;
$gname =~ s/\s+//g;
my ($ref) = /protein_id="([\w.]+)"/;
my ($gid) = /db_xref="(GI:\w+)"/;
my ($seq) = /translation="([A-Z\s]+)"/;
$seq =~ s/\s+//g;
print F "$gname|$ref|$gid\n$seq\n";
}
close F;
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]