Typo in code. my fault. Try again!
On Thu, 2006-06-08 at 10:23 -0400, Seth Johnson wrote: > I'm still getting an empty array back from this: > > Note [] myAccs = ((RichAnnotation)rs.getAnnotation()).getProperties > (INSDseqFormat.Terms.getOtherSeqIdTerm()); > > Here's the file that I'm parsing: > ~~~~~~~~~~~~~~~~~~~~~~ > <?xml version="1.0"?> > <!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN" > "http://www.ncbi.nlm.nih.gov/dtd/INSD_INSDSeq.dtd"> > <INSDSet> > <INSDSeq> > <INSDSeq_locus>AY069118</INSDSeq_locus> > <INSDSeq_length>1502</INSDSeq_length> > <INSDSeq_strandedness>single</INSDSeq_strandedness> > <INSDSeq_moltype>mRNA</INSDSeq_moltype> > <INSDSeq_topology>linear</INSDSeq_topology> > <INSDSeq_division>INV</INSDSeq_division> > <INSDSeq_update-date>17-DEC-2001</INSDSeq_update-date> > <INSDSeq_create-date>15-DEC-2001</INSDSeq_create-date> > <INSDSeq_definition>Drosophila melanogaster GH13089 full length > cDNA</INSDSeq_definition> > <INSDSeq_primary-accession>AY069118</INSDSeq_primary-accession> > <INSDSeq_accession-version>AY069118.1</INSDSeq_accession-version> > <INSDSeq_other-seqids> > <INSDSeqid>gb|AY069118.1|</INSDSeqid> > <INSDSeqid>gi|17861571</INSDSeqid> > </INSDSeq_other-seqids> > <INSDSeq_keywords> > <INSDKeyword>FLI_CDNA</INSDKeyword> > </INSDSeq_keywords> > <INSDSeq_source>Drosophila melanogaster (fruit > fly)</INSDSeq_source> > <INSDSeq_organism>Drosophila melanogaster</INSDSeq_organism> > <INSDSeq_taxonomy>Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; > Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; > Ephydroidea; Drosophilidae; Drosophila</INSDSeq_taxonomy> > <INSDSeq_references> > <INSDReference> > <INSDReference_reference>1 (bases 1 to > 1502)</INSDReference_reference> > <INSDReference_position>1..1502</INSDReference_position> > <INSDReference_authors> > <INSDAuthor>Stapleton,M.</INSDAuthor> > <INSDAuthor>Brokstein,P.</INSDAuthor> > <INSDAuthor>Hong,L.</INSDAuthor> > <INSDAuthor>Agbayani,A.</INSDAuthor> > <INSDAuthor>Carlson,J.</INSDAuthor> > <INSDAuthor>Champe,M.</INSDAuthor> > <INSDAuthor>Chavez,C.</INSDAuthor> > <INSDAuthor>Dorsett,V.</INSDAuthor> > <INSDAuthor>Farfan,D.</INSDAuthor> > <INSDAuthor>Frise,E.</INSDAuthor> > <INSDAuthor>George,R.</INSDAuthor> > <INSDAuthor>Gonzalez,M.</INSDAuthor> > <INSDAuthor>Guarin,H.</INSDAuthor> > <INSDAuthor>Li,P.</INSDAuthor> > <INSDAuthor>Liao,G.</INSDAuthor> > <INSDAuthor>Miranda,A.</INSDAuthor> > <INSDAuthor>Mungall,C.J.</INSDAuthor> > <INSDAuthor>Nunoo,J.</INSDAuthor> > <INSDAuthor>Pacleb,J.</INSDAuthor> > <INSDAuthor>Paragas,V.</INSDAuthor> > <INSDAuthor>Park,S.</INSDAuthor> > <INSDAuthor>Phouanenavong,S.</INSDAuthor> > <INSDAuthor>Wan,K.</INSDAuthor> > <INSDAuthor>Yu,C.</INSDAuthor> > <INSDAuthor>Lewis,S.E.</INSDAuthor> > <INSDAuthor>Rubin,G.M.</INSDAuthor> > <INSDAuthor>Celniker,S.</INSDAuthor> > </INSDReference_authors> > <INSDReference_title>Direct Submission</INSDReference_title> > <INSDReference_journal>Submitted (10-DEC-2001) Berkeley > Drosophila Genome Project, Lawrence Berkeley National Laboratory, One > Cyclotron Road, Berkeley, CA 94720, USA</INSDReference_journal> > </INSDReference> > </INSDSeq_references> > <INSDSeq_comment>Sequence submitted by: Berkeley Drosophila Genome > Project Lawrence Berkeley National Laboratory Berkeley, CA 94720 This > clone was sequenced as part of a high-throughput process to sequence > clones from Drosophila Gene Collection 1 (Rubin et al., Science 2000). > The sequence has been subjected to integrity checks for sequence > accuracy, presence of a polyA tail and contiguity within 100 kb in the > genome. Thus we believe the sequence to reflect accurately this > particular cDNA clone. However, there are artifacts associated with > the generation of cDNA clones that may have not been detected in our > initial analyses such as internal priming, priming from contaminating > genomic DNA, retained introns due to reverse transcription of > unspliced precursor RNAs, and reverse transcriptase errors that result > in single base changes. For further information about this sequence, > including its location and relationship to other sequences, please > visit our Web site ( http://fruitfly.berkeley.edu) or send email to > [EMAIL PROTECTED]</INSDSeq_comment> > <INSDSeq_feature-table> > <INSDFeature> > <INSDFeature_key>source</INSDFeature_key> > <INSDFeature_location>1..1502</INSDFeature_location> > <INSDFeature_intervals> > <INSDInterval> > <INSDInterval_from>1</INSDInterval_from> > <INSDInterval_to>1502</INSDInterval_to> > <INSDInterval_accession>AY069118.1</INSDInterval_accession> > </INSDInterval> > </INSDFeature_intervals> > <INSDFeature_quals> > <INSDQualifier> > <INSDQualifier_name>organism</INSDQualifier_name> > <INSDQualifier_value>Drosophila > melanogaster</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>mol_type</INSDQualifier_name> > <INSDQualifier_value>mRNA</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>strain</INSDQualifier_name> > <INSDQualifier_value>y; cn bw sp</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>db_xref</INSDQualifier_name> > <INSDQualifier_value>taxon:7227</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>map</INSDQualifier_name> > <INSDQualifier_value>39B3-39B3</INSDQualifier_value> > </INSDQualifier> > </INSDFeature_quals> > </INSDFeature> > <INSDFeature> > <INSDFeature_key>gene</INSDFeature_key> > <INSDFeature_location>1..1502</INSDFeature_location> > <INSDFeature_intervals> > <INSDInterval> > <INSDInterval_from>1</INSDInterval_from> > <INSDInterval_to>1502</INSDInterval_to> > <INSDInterval_accession> AY069118.1</INSDInterval_accession> > </INSDInterval> > </INSDFeature_intervals> > <INSDFeature_quals> > <INSDQualifier> > <INSDQualifier_name>gene</INSDQualifier_name> > <INSDQualifier_value>E2f2</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>note</INSDQualifier_name> > <INSDQualifier_value>alignment with genomic scaffold > AE003669</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>db_xref</INSDQualifier_name> > > <INSDQualifier_value>FLYBASE:FBgn0024371</INSDQualifier_value> > </INSDQualifier> > </INSDFeature_quals> > </INSDFeature> > <INSDFeature> > <INSDFeature_key>CDS</INSDFeature_key> > <INSDFeature_location>189..1301</INSDFeature_location> > <INSDFeature_intervals> > <INSDInterval> > <INSDInterval_from>189</INSDInterval_from> > <INSDInterval_to>1301</INSDInterval_to> > <INSDInterval_accession> AY069118.1</INSDInterval_accession> > </INSDInterval> > </INSDFeature_intervals> > <INSDFeature_quals> > <INSDQualifier> > <INSDQualifier_name>gene</INSDQualifier_name> > <INSDQualifier_value>E2f2</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>note</INSDQualifier_name> > <INSDQualifier_value>Longest ORF</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>codon_start</INSDQualifier_name> > <INSDQualifier_value>1</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>transl_table</INSDQualifier_name> > <INSDQualifier_value>1</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>product</INSDQualifier_name> > <INSDQualifier_value>GH13089p</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>protein_id</INSDQualifier_name> > <INSDQualifier_value>AAL39263.1</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>db_xref</INSDQualifier_name> > <INSDQualifier_value>GI:17861572</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>db_xref</INSDQualifier_name> > > <INSDQualifier_value>FLYBASE:FBgn0024371</INSDQualifier_value> > </INSDQualifier> > <INSDQualifier> > <INSDQualifier_name>translation</INSDQualifier_name> > > <INSDQualifier_value>MYKRKTASIVKRDSSAAGTTSSAMMMKVDSAETSVRSQSYESTPVSMDTSPDPPTPIKSPSNSQSQSQPGQQRSVGSLVLLTQKFVDLVKANEGSIDLKAATKILDVQKRRIYDITNVLEGIGLIDKGRHCSLVRWRGGGFNNAKDQENYDLARSRTNHLKMLEDDLDRQLEYAQRNLRYVMQDPSNRSYAYVTRDDLLDIFGDDSVFTIPNYDEEVDIKRNHYELAVSLDNGSAIDIRLVTNQGKSTTNPHDVDGFFDYHRLDTPSPSTSSHSSEDGNAPACAGNVITDEHGYSCNPGMKDEMKLLENELTAKIIFQNYLSGHSLRRFYPDDPNLENPPLLQLNPPQEDFNFALKSDEGICELFDVQCS</INSDQualifier_value> > > </INSDQualifier> > </INSDFeature_quals> > </INSDFeature> > </INSDSeq_feature-table> > > <INSDSeq_sequence>AAGAATAGAGGGAGAATGAAAAAAATGACATAAATGGCGGAAAGCAAACCTAGCGCCAACATTCGTATTTTCGTTTAATTTTCGCTCCAAAGTGCAATTAATTCCGGCTTCTTGATCGCTGCATATTGAGTGCAGCCACGCAAAGAGTTACAAGGACAGGAGTATAGTCATCGAGTCGATTGCGGACCATGTACAAGCGCAAAACCGCGAGTATTGTTAAAAGAGACAGCTCCGCAGCGGGCACCACCTCCTCGGCTATGATGATGAAGGTGGATTCGGCTGAGACTTCGGTCCGGTCGCAGAGCTACGAGTCTACACCCGTTAGCATGGACACATCACCGGATCCTCCAACGCCAATCAAGTCTCCGTCGAATTCACAATCGCAATCGCAGCCTGGACAACAGCGCTCCGTGGGCTCACTGGTCCTGCTCACACAGAAGTTTGTGGATCTCGTGAAGGCCAACGAAGGATCCATCGACCTGAAAGCGGCAACCAAAATCTTGGACGTACAGAAGCGCCGAATATACGATATTACCAATGTTTTAGAGGGCATTGGACTAATTGATAAGGGCAGACACTGCTCCCTAGTGCGCTGGCGCGGAGGGGGCTTTAACAATGCCAAGGACCAAGAGAACTACGACCTGGCACGTAGCCGGACTAATCATTTGAAAATGTTGGAGGATGACCTAGACAGGCAACTGGAGTATGCACAGCGCAATCTGCGCTACGTTATGCAGGATCCCTCGAATAGGTCGTATGCATATGTGACACGTGATGATCTGCTGGACATCTTTGGAGATGATTCCGTATTCACAATACCTAATTATGACGAGGAAGTAGATATCAAGCGTAATCATTACGAGCTGGCCGTGTCGCTGGACAATGGCAGCGCAATTGACATTCGCCTGGTGACGAACCAAGGAAAGAGTACTACAAATCCGCACGATGTGGATGGGTTCTTTGACTATC! ACCGTCTGGACACGCCCTCACCCTCGACGTCGTCGCACTCCAGCGAGGATGGTAACGCTCCAGCATGCGCGGGGAACGTGATCACCGACGAGCACGGTTACTCGTGCAATCCCGGGATGAAAGATGAGATGAAACTTTTGGAGAACGAGCTGACGGCCAAGATAATCTTCCAAAATTATCTGTCCGGTCATTCGCTGCGGCGATTTTATCCCGATGATCCGAATCTAGAAAACCCGCCGCTGCTGCAGCTGAATCCTCCGCAGGAAGACTTCAACTTTGCGTTAAAAAGCGACGAAGGTATTTGCGAGCTGTTTGATGTTCAGTGCTCCTAACTGTGGAAGGGGATGTACACCTTAGGACTATAGCTACACTGCAACTGGCCGCGTGCATTGTGCAAATATTTATGATTAGTACAATTTTGACTTTGGATTTCTCTATATCGTCTAGAAATTTTTAATTAGTGTAATACCTTGTAATTTCGCAAATAACAGCAAAACCAATAAATTCGTAAATGCAAAAAAAAAAAAAAAAAA</INSDSeq_sequence> > </INSDSeq> > </INSDSet> > ~~~~~~~~~~~~~~~~~~~~~~ > > On 6/8/06, Richard Holland <[EMAIL PROTECTED]> wrote: > Yesterday I think I said I was going to add other-seqids but I > forgot to > do it, so I did it just now. Try it and see. Use the new > INSDseqFormat.Terms.getOtherSeqIdTerm() term to find them. > > cheers, > Richard > > On Wed, 2006-06-07 at 19:48 -0400, Seth Johnson wrote: > > Hi Richard, > > > > I still cannot locate the GI number for the main > sequence. After I > > parse it with readINSDseqDNA, I then use: > > > > Note [] myAccs = > ((RichAnnotation)rs.getAnnotation > > ()).getProperties(Terms.getAdditionalAccessionTerm ()); > > > > However, the 'myAccs' appears to be empty. Am I on the > wrong track to > > get to other-seqids??? > > > > On 6/6/06, Richard Holland < [EMAIL PROTECTED]> > wrote: > > GenBank has a separate line for GI number, so it can > be parsed > > out > > nicely. INSDseq does not, so you have to rely on the > other- > > seqids tag > > and hope that one of them is the GI number. However > it seems I > > have not > > included that tag in the parser, so I will include > it. This > > will make > > the other-seqids values available through the notes > with the > > term > > Terms.getAdditionalAccessionTerm(), but > getIdentifier() will > > remain > > null. > > > > For your second question, the tutorial makes the > mistake in > > several > > places of saying getNoteSet(Terms.blahblah()). This > was > > shorthand for: > > > > rs.getAnnotation().getProperty(Terms.blahblah()) > > (for single values) > > > > or > > > > ((RichAnnotation)rs.getAnnotation()).getProperties > > ( Terms.blahblah ()) > > (for multiple values) > > > > but never got expanded. Maybe someone can fix that > one > > day... :)ded... > > > > I'm just updating INSDseq to 1.4 now. The guys next > door gave > > me the > > details of the changes, and told me that 1.3 is > actually no > > longer > > supported by them after Friday this week! So I'll > make it 1.4 > > only. > > > > cheers, > > Richard > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > > > > > -- > Best Regards, > > > Seth Johnson > Senior Bioinformatics Associate > > Ph: (202) 470-0900 > Fx: (775) 251-0358 -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
