No, I'll update my source. Thanks,
Jolyon -----Original Message----- From: Richard Holland [mailto:[EMAIL PROTECTED] Sent: 20 April 2006 13:16 To: Jolyon Holdstock Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Did you use the latest CVS version? (I committed a change that I think should have fixed that about 1 minute before my previous email). On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote: > I've run the sequence through the parser and it seems to work OK. I > iterate through the features and then iterate through the annotations of > that feature > > Based on the input.... > > FT source 1..118 > FT /organism="Triturus helveticus" > FT /mol_type="genomic DNA" > FT /clone="Thel.b9" > FT /db_xref="taxon:256425" > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > FT /codon_start=2 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > FT /protein_id="ABA39736.1" > FT > /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > > The output is.... > > ======================================== > Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) > Note: (#0) biojavax:mol_type: genomic DNA > Note: (#1) biojavax:clone: Thel.b9 > ======================================== > Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) > Note: (#2) biojavax:gene: Hoxb9 > Note: (#3) biojavax:note: Hoxb-9 > ======================================== > Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) > Note: (#4) biojavax:gene: Hoxb9 > Note: (#5) biojavax:product: HOXB9 > ======================================== > Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) > Note: (#6) biojavax:codon_start: 2 > Note: (#7) biojavax:gene: Hoxb9 > Note: (#8) biojavax:product: HOXB9 > Note: (#9) biojavax:protein_id: ABA39736.1 > Note: (#10) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > Note: (#11) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > ============================================= > > This looks OK, the one thing I've just noticed is that the last piece of > annotation of the last feature is assigned twice. > > Jolyon > > > -----Original Message----- > From: Richard Holland [mailto:[EMAIL PROTECTED] > Sent: 20 April 2006 13:05 > To: [EMAIL PROTECTED] > Cc: Jolyon Holdstock; [EMAIL PROTECTED] > Subject: Re: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > Hi. > > I made some small changes to the code, although nothing that would fix > this kind of problem, committed it back to CVS, checked it out again, > compiled, and ran a test program that read in an EMBL file with the > feature table you describe below, and output it in EMBL format to > another file. I then compared the two files... and found no differences! > The split-on-equals problem didn't occur, and all notes appeared > alongside their correct features. > > Could there be a problem maybe with the script you are using? > > I've really no idea what the problem is as I can't reproduce it based on > the current CVS contents! > > cheers, > Richard > > On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > > Hi, > > > > I have tested today's version from CVS. > > > > Both EBI and Ensembl files now react the same way. > > The last annotation of a feature is nevertheless related to its > > immediate following feature. > > e.g. : > > > > FT gene <1..>118 > > FT /gene="Hoxb9" > > FT /note="Hoxb-9" > > FT mRNA <1..>118 > > FT /gene="Hoxb9" > > FT /product="HOXB9" > > FT CDS <1..>118 > > > > /note="Hoxb-9" is related to mRNA > > /product="HOXB9" is related to CDS > > > > Concerning the split-on-equals problem, I still observe the problem : > > > > [(#2) biojavax:note: transcript_i] > > > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > > > Thanks for helping, > > > > Cheers, > > > > Morgane. > > > > Richard Holland wrote: > > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > > also attempted to fix the split-on-equals problem Morgane observed. > > > > > > Please let me know if there are any problems with it. > > > > > > As this problem affected the UniProt parser in a similar manner > (much of > > > the code is identical), the same fixes were applied there too. > > > > > > cheers, > > > Richard > > > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > > > >> Hi Morgane, > > >> > > >> I have amended the EmblFormat readSection method as below and the > > >> parsing seems to work; please test it. > > >> > > >> I think that the last bit of annotation is carried over into the > next > > >> feature so before adding the new feature I dump the annotation and > reset > > >> currentTag and currentVal. > > >> > > >> if (!line.startsWith(" ")) { > > >> //--------- new code starts --------------------------- > > >> if (currentTag!=null) { > > >> section.add(new String[]{currentTag,currentVal.toString()}); > > >> currentTag = null; > > >> currentVal = null; > > >> } > > >> //--------- new code ends ----------------------------- > > >> // case 1 : word value - splits into key-value on its own > > >> section.add(line.split("\\s+")); > > >> } > > >> > > >> Cheers, > > >> > > >> Jolyon > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: [EMAIL PROTECTED] > > >> [mailto:[EMAIL PROTECTED] On Behalf Of Morgane > > >> THOMAS-CHOLLIER > > >> Sent: 12 April 2006 09:35 > > >> To: [EMAIL PROTECTED] > > >> Subject: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > >> > > >> Hello again, > > >> > > >> I am currently using biojavax to parse EMBL files exported from > Ensembl > > >> website. > > >> > > >> Compared to the EBI files I have, they show a difference in the > Features > > >> > > >> lines : > > >> > > >> sometimes, only one "/word" is present. ie: > > >> > > >> EBI file : > > >> > > >> FT gene <1..>118 > > >> FT /gene="Hoxb9" > > >> FT /note="Hoxb-9" > > >> > > >> Ensembl file; > > >> > > >> FT gene complement(1..3218) > > >> FT /gene="ENSMUSG00000038227" > > >> > > >> The problem I encounter is that the parser correctly convert the > "/word" > > >> > > >> into a Note, but the Note is then in relation with the immediate > > >> following feature (ie: mRNA). > > >> The current gene feature thus has no annotation. > > >> > > >> This behavior is reproducible when removing one "/word" of an EBI > file. > > >> > > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" > inside a > > >> > > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends > up > > >> with an incomplete Note, as the parser seems to split on "=" to > separate > > >> > > >> the Key and the Value. > > >> > > >> Thanks for your help, > > >> > > >> Morgane. > > >> > > >> > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
