On 18 Aug 2010, at 09:52, Adam Witney wrote:
Hi Andy,
Yes I am aware of the some of the idiosyncrasies of the Ensembl
Genomes naming conventions. But is there a reason that the DAS
registry should be constrained by Ensembl Genomes? Could the
Registry entry refer to a specific taxonomy iD and its corresponding
entry in EG, despite EG using a different taxonomy ID?
I'd like to be able to export our microarray designs and data via
DAS for others to use (including EnsemblBacteria). This if for 16 or
so species with multiple strains thereof.
Just to say that I think we should get this as straight as we can;
just to
state the obvious - EG is not trying to be deliberately complex here,
it is
just that the concept of "one taxid == one species == one assembly
series" just
breaks down in bacteria.
I've brought in the three key people here on the EG side - Eugene
(does the web
side of this); Dan (main data production manager) and Paul Kersey (the
EG PI) -
some of them are on holiday now, but I suggest perhaps setting up a
phone
conference (and/or Adam could you come for a visit?) to get this as
straight as
we can - I suspect there will both be short term fixes and more longer
term
infrastructural fixes here.
cheers
Adam
On 17 Aug 2010, at 18:07, Andy Jenkinson wrote:
Hi Adam,
There are no coordinate systems yet as nobody has yet been brave
enough to start using DAS with bacteria in anger. Eugene at Ensembl
Genomes will have an interest in doing this, but they have issues
with matching up their species/strain names with the NCBI taxonomy
upon which DAS's coordinates are based. In essence if you will need
to name the coordinate systems after which they will need to be
added to the registry.
For example when Ensembl Genomes manage to do this, the coordinate
systems might end up looking like:
EB_1,Chromosome,Shigella flexneri 2a str. 301
EB_1,Plasmid,Shigella flexneri 2a str. 301
This is for a specific shigella strain with taxonomy ID 198214. The
authority and version parts of the DAS coordinate system are
somewhat arbitrarily named, ideally they would be a standard that
is used by the rest of the community for interoperability purposes.
What exactly is it you'd like to be able to do? How many species'
are we talking about?
The reason I ask is that getting these coordinate systems into the
DAS registry does require some work. Some of this is on the
registry's side, but depending where your data come from there may
be issues with identifying the correct coordinate system details
such that others can reuse them meaningfully. To use the example
above, Ensembl Genomes give the "301" strain a different name from
NCBI and use the taxonomy ID not for the strain but for the parent
species (Shigella flexneri). In fact the 2457T strain also uses the
same taxonomy ID, which isn't helpful. Given the number of
species', this adds up to a major headache.
Cheers,
Andy
On 17 Aug 2010, at 16:49, Adam Witney wrote:
Hi,
What would be the best approach to use DAS with bacterial genomes?
I can't seem to find any coordinate systems for these organisms in
the Registry.
Thanks for any advice
Adam
_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das
_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das
_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das