This is an update if you have been following this thread.



From: aditi gupta [mailto:[EMAIL PROTECTED]
Sent: Monday, June 14, 2004 12:48 AM
To: Zeus Odin
Subject: RE: getting online information....


hi,
there are following 12 fields:

              1
gi|37182815|gb|AY358849.1|

             2                      3    4 5  6  7
gi|28592069|gb|U63637.2|BTU63637 100.00 17 0  0 552

 8    9   10     11     12
568 3218 3234   1.1    34.19

1 -> query id(gi number)
2 ->subject id(gi number)
3 ->identity %
4 ->alignment length
5 ->mismatches
 6 ->gap openings
7 ->   q. start
8 ->   q. end
9 ->   s. start
10 -> s. end
11 -> e-value
12 -> bit score

also, gi|28592069|gb|U63637.2|BTU63637 and
gi|14318385|gb|AC089993.2|both are gi numbers and a
 single field i.e. subject id.
the last part of 1st gi number i.e. BTU63637, is not present in all gi
numbers........
general form of a gi number is:
gi | [0-9] | [a-z] | [0-9A-Z .] | [A-Z0-9]
where the underlined part is not present in all gi numbers.
i hope now i've made myself clear.........
sorry for the confusion.....please don't get angry.......
if it is still not clear, then please write to me........
waiting for ur help.
with regards
-aditi

Zeus Odin <[EMAIL PROTECTED]> wrote:
There are 9 fields, not 12!!!! Do you understand that?

1  2        3  4           5  6        7  8        9
gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0
552 568 3218 3234   1.1 34.19
gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435 455
56604 56624   1.1 34.19
gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 100.00 16 0 0 260 275
89982 89967   4.2 32.21
gi|37182815|gb|AY358849.1| gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 0
345 361 242 226   1.1 34.19

If you insist that there are more than 9, then there are

1  2        3  4           5  6        7  8        9        10     11 12 13
14  15  16   17     18  19
gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0  0
552 568 3218 3234   1.1 34.19
OR
1  2        3  4           5  6        7  8           9     10 11 12 13  14
15    16      17  18
gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1  0  435 455
56604 56624   1.1 34.19

Do you see the difference in the last two lines? Line 1 has 19 "fields" and
line 2 has 18. Please make a map like the one above and CLEARLY denote which
field is which and if some fields are occasionally missing. There is no way
I can look at your raw data and conclude there are 12 fields. Point out the
12. You have not properly done that yet. Until then, I cannot help you
anymore.

-ZO




From: aditi gupta [mailto:[EMAIL PROTECTED]
Sent: Sunday, June 13, 2004 2:09 PM
To:  Zeus Odin
Subject: Re: getting online information....


hi,

sorry for the confusion.....i didn't explained my case well...

anyways,
the 12 fields are:

Query id, Subject id, % identity, alignment length,
mismatches,gap openings, q. start, q. end, s. start, s. end, e-value, bit
score

and the required 6 fields are:

Subject id, % identity, alignment length,
mismatches, q. start, q. end

Also gi|37182815|gb|AY358849.1|  and gi|28592069|gb|U63637.2|BTU63637
are two different fields i.e. query i.d and subject id. respectively.

what i did in my script was :i splitted this file into arrays using query id
as regex .......n with only the required fields......n hence in my output i
didn't got query id and other fields which were not required.....

also, you said that:
'If you look at the last field of
your REQUIRED output, you want *6* items separated by spaces in rows 1
and 4
but only *5* items in rows 2 and 3.'

it is not so......in my required out put i have all the 6 items including
subject id.......but yes they were not properly separated and items of same
field not below one another.......that's why confusion might have occured.

my output  is:(now separated)subjt id;identity%;alignment
length;mismatches;query start point;query end point

gi|28592069|gb|U63637.2|BTU63637      100.00    17    0     552      568
gi|14318385|gb|AC089993.2|                  95.24     21    1     435
455 gi|14318385|gb|AC089993.2|                 100.00    16     0    260
275
gi|7385112|gb|AF222766.1|AF222766     100.00    17    0     345     361

but i also required the gene and chromosome name of the nucleotide seq in
the output besides these fields......
for this, one has to feed the subject id(which is gi number of the seq)
to the link:
 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide

and retreive gene and chromosome info from 'DEFAULT' (not summary) page that
appears.
gene or chr names may not be available n in some cases(like, clones or
complete genome seq) more than one genes are mentioned.all are required.

my problem is that i'm new to perl and don't know how to use perl for
getting online data......

i hope now i have presented a clear picture to you......could you please
help me?



"Aditi gupta" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> hi to all,
>
> i had a file which contained following data:
>
> # BLASTN 2.2.9 [May-01-2004]
> # Query: gi|37182815|gb|AY358849.1| Homo sapiens clone DNA180287 ALTE
(UNQ6508) mRNA, complete cds
> # Database: nr
> # Fields: Query id, Subject id, % identity, alignment length, mismatches,
gap openings, q. start, q. end, s. start, s. end, e-value, bit score
> gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0
552 568 3218 3234   1.1 34.19
> gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435 455
56604 56624   1.1 34.19
> gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 100.00 16 0 0 260
275 89982 89967   4.2 32.21
> gi|37182815|gb|AY358849.1| gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 0
345 361 242 226   1.1 34.19
>
> but i required only some of the fields, and with the help of members of
this maillist, i succeeded and obtained following output:
>
> gi|28592069|gb|U63637.2|BTU63637   100.00   17   0   552   568
> gi|14318385|gb|AC089993.2|   95.24   21  1  435  455
> gi|14318385|gb|AC089993.2|  100.00   16  0  260  275
> gi|7385112|gb|AF222766.1|AF222766  100.00  17  0  345  361
>
> the code is:
>
> #!/usr/bin/perl
> $/ = undef;
> use Getopt::Long;
> (GetOptions("f|filename=s"=>\$file));
> open (IN,$file) or die "Error opening $file:$!\n";
> open (OUT,">>$file.txt")or die "Error opening $file.txt:$!\n";
> $list = <IN>;
> @seqs = split( /gi\|37182815\|gb\|AY358849.1\|/, $list );
> foreach $seq(@seqs){
> if ($seq =~ /(gi\|\d+\|gb\|[0-9A-Z.]+\|([0-9A-Z.]+)?)
>   \s*
>   ([0-9.]+)
>   \s+
>   (\d+)
>   \s+
>   (\d+)
>   \s+
>   \d+
>   \s+
>   (\d+)
>   \s+
>   (\d+)
>   /x)
> {
>  $id=$1;
>  $identity_percentage=$3;
>  $align_length=$4;
>  $mismatches=$5;
>  $q_start=$6;
>  $q_end=$7;
> }
> print OUT
"\n$id\t$identity_percentage\t$align_length\t$mismatches\t$q_start\t$q_end\n
";
> }
>
> exit;
>
>
>
> but i also have to feed the gi number(the first field) into ncbi entrez
nucleotide site:
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
> and retreive the gene and chromosome name, if available from the resulting
web page ........
> is it possible to get the gene n chromosome info in the output with other
fields?what changes in code are required?
>
> please help!! i don't have any idea of using internet with perl......



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to