Hello again, I am currently using biojavax to parse EMBL files exported from Ensembl website.
Compared to the EBI files I have, they show a difference in the Features lines : sometimes, only one "/word" is present. ie: EBI file : FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" Ensembl file; FT gene complement(1..3218) FT /gene="ENSMUSG00000038227" The problem I encounter is that the parser correctly convert the "/word" into a Note, but the Note is then in relation with the immediate following feature (ie: mRNA). The current gene feature thus has no annotation. This behavior is reproducible when removing one "/word" of an EBI file. Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up with an incomplete Note, as the parser seems to split on "=" to separate the Key and the Value. Thanks for your help, Morgane. -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
