>> One idea to address both of these issues is to embed the
>> original format in the fasta name so that it's clear whether
>> the coords are BED or GFF (e.g. >
>> hg17_BED_chr1_147962192_147962580).
> 
> Or hg17_gtf_chr1_147962192_147962580 etc.
> 
> That certainly seems better than the current situation.
> 
> However, my preferred solution is to take the FASTA ID from
> the annotation file. In GFF3 this would be the ID tag in column
> nine (if present), perhaps with an option to use another
> custom tag like locus_tag or transcript_id if preferred.

Hi Peter,

This seems reasonable. Of course, the implementation needs to be done with care 
to (a) ensure the default choice is somewhat similar to what is done now and 
(b) support all flavors of GFF. If you choose to implement this, you'll also 
need to update all the existing test output files. 

> For BED I had initially thought this would the optional
> column 4, name. This made me wonder what Galaxy
> is doing in converting GFF3 to BED, since column 4 was
> populated with generic feature types (gene, CDS, etc
> from GFF3 column 2). Shouldn't this be using the feature's
> ID tag (if present)?

Yes, I'd say that's correct. The GFF-to-BED converter was written before we had 
GFF parsing support, and at the time it wasn't possible to extract the name 
from the attributes. 

Finally, note that all changes made to any GFF code must work for GFF, GFF3, 
and GTF formats.

Thanks,
J.




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to