This is an update if you have been following this thread.
From: aditi gupta [mailto:[EMAIL PROTECTED] Sent: Monday, June 14, 2004 12:48 AM To: Zeus Odin Subject: RE: getting online information.... hi, there are following 12 fields: 1 gi|37182815|gb|AY358849.1| 2 3 4 5 6 7 gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0 552 8 9 10 11 12 568 3218 3234 1.1 34.19 1 -> query id(gi number) 2 ->subject id(gi number) 3 ->identity % 4 ->alignment length 5 ->mismatches 6 ->gap openings 7 -> q. start 8 -> q. end 9 -> s. start 10 -> s. end 11 -> e-value 12 -> bit score also, gi|28592069|gb|U63637.2|BTU63637 and gi|14318385|gb|AC089993.2|both are gi numbers and a single field i.e. subject id. the last part of 1st gi number i.e. BTU63637, is not present in all gi numbers........ general form of a gi number is: gi | [0-9] | [a-z] | [0-9A-Z .] | [A-Z0-9] where the underlined part is not present in all gi numbers. i hope now i've made myself clear......... sorry for the confusion.....please don't get angry....... if it is still not clear, then please write to me........ waiting for ur help. with regards -aditi Zeus Odin <[EMAIL PROTECTED]> wrote: There are 9 fields, not 12!!!! Do you understand that? 1 2 3 4 5 6 7 8 9 gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0 552 568 3218 3234 1.1 34.19 gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435 455 56604 56624 1.1 34.19 gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 100.00 16 0 0 260 275 89982 89967 4.2 32.21 gi|37182815|gb|AY358849.1| gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 0 345 361 242 226 1.1 34.19 If you insist that there are more than 9, then there are 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0 552 568 3218 3234 1.1 34.19 OR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435 455 56604 56624 1.1 34.19 Do you see the difference in the last two lines? Line 1 has 19 "fields" and line 2 has 18. Please make a map like the one above and CLEARLY denote which field is which and if some fields are occasionally missing. There is no way I can look at your raw data and conclude there are 12 fields. Point out the 12. You have not properly done that yet. Until then, I cannot help you anymore. -ZO From: aditi gupta [mailto:[EMAIL PROTECTED] Sent: Sunday, June 13, 2004 2:09 PM To: Zeus Odin Subject: Re: getting online information.... hi, sorry for the confusion.....i didn't explained my case well... anyways, the 12 fields are: Query id, Subject id, % identity, alignment length, mismatches,gap openings, q. start, q. end, s. start, s. end, e-value, bit score and the required 6 fields are: Subject id, % identity, alignment length, mismatches, q. start, q. end Also gi|37182815|gb|AY358849.1| and gi|28592069|gb|U63637.2|BTU63637 are two different fields i.e. query i.d and subject id. respectively. what i did in my script was :i splitted this file into arrays using query id as regex .......n with only the required fields......n hence in my output i didn't got query id and other fields which were not required..... also, you said that: 'If you look at the last field of your REQUIRED output, you want *6* items separated by spaces in rows 1 and 4 but only *5* items in rows 2 and 3.' it is not so......in my required out put i have all the 6 items including subject id.......but yes they were not properly separated and items of same field not below one another.......that's why confusion might have occured. my output is:(now separated)subjt id;identity%;alignment length;mismatches;query start point;query end point gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 552 568 gi|14318385|gb|AC089993.2| 95.24 21 1 435 455 gi|14318385|gb|AC089993.2| 100.00 16 0 260 275 gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 345 361 but i also required the gene and chromosome name of the nucleotide seq in the output besides these fields...... for this, one has to feed the subject id(which is gi number of the seq) to the link: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide and retreive gene and chromosome info from 'DEFAULT' (not summary) page that appears. gene or chr names may not be available n in some cases(like, clones or complete genome seq) more than one genes are mentioned.all are required. my problem is that i'm new to perl and don't know how to use perl for getting online data...... i hope now i have presented a clear picture to you......could you please help me? "Aditi gupta" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > hi to all, > > i had a file which contained following data: > > # BLASTN 2.2.9 [May-01-2004] > # Query: gi|37182815|gb|AY358849.1| Homo sapiens clone DNA180287 ALTE (UNQ6508) mRNA, complete cds > # Database: nr > # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score > gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0 552 568 3218 3234 1.1 34.19 > gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435 455 56604 56624 1.1 34.19 > gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 100.00 16 0 0 260 275 89982 89967 4.2 32.21 > gi|37182815|gb|AY358849.1| gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 0 345 361 242 226 1.1 34.19 > > but i required only some of the fields, and with the help of members of this maillist, i succeeded and obtained following output: > > gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 552 568 > gi|14318385|gb|AC089993.2| 95.24 21 1 435 455 > gi|14318385|gb|AC089993.2| 100.00 16 0 260 275 > gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 345 361 > > the code is: > > #!/usr/bin/perl > $/ = undef; > use Getopt::Long; > (GetOptions("f|filename=s"=>\$file)); > open (IN,$file) or die "Error opening $file:$!\n"; > open (OUT,">>$file.txt")or die "Error opening $file.txt:$!\n"; > $list = <IN>; > @seqs = split( /gi\|37182815\|gb\|AY358849.1\|/, $list ); > foreach $seq(@seqs){ > if ($seq =~ /(gi\|\d+\|gb\|[0-9A-Z.]+\|([0-9A-Z.]+)?) > \s* > ([0-9.]+) > \s+ > (\d+) > \s+ > (\d+) > \s+ > \d+ > \s+ > (\d+) > \s+ > (\d+) > /x) > { > $id=$1; > $identity_percentage=$3; > $align_length=$4; > $mismatches=$5; > $q_start=$6; > $q_end=$7; > } > print OUT "\n$id\t$identity_percentage\t$align_length\t$mismatches\t$q_start\t$q_end\n "; > } > > exit; > > > > but i also have to feed the gi number(the first field) into ncbi entrez nucleotide site: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide > and retreive the gene and chromosome name, if available from the resulting web page ........ > is it possible to get the gene n chromosome info in the output with other fields?what changes in code are required? > > please help!! i don't have any idea of using internet with perl...... -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>