On 18 Aug 2010, at 20:25, Lincoln Stein wrote: > It sounds to me as though the DAS source metadata needs just one additional > field to indicate the strain, isolate or individual, making the hierarchy: > > taxid -> strain/isolate/individual id -> assembly > > (the taxid and the assembly are already in DAS) Is there a definitive > repository of isolate IDs that could be used for this purpose?
most of the sequenced strains already have strain specific taxid's. Although I don't know how NCBI add new taxid's so not sure how this will scale with the number of genomes that are currently being sequenced. > As mentioned elsewhere in this thread, the problem of distinguishing an > individual from its taxon is not limited to bacteria. Does the 1000 genomes > project use the assembly as a surrogate for isolate/individual? No it's not unique to bacteria, indeed I notice that Plasmodium vivax is already in the DAS registry, does this not suffer the same problems? i.e which strain are the coordinates referring to? Adam > Lincoln > > On Wed, Aug 18, 2010 at 5:30 AM, Ewan Birney <[email protected]> wrote: > > On 18 Aug 2010, at 09:52, Adam Witney wrote: > > > Hi Andy, > > Yes I am aware of the some of the idiosyncrasies of the Ensembl Genomes > naming conventions. But is there a reason that the DAS registry should be > constrained by Ensembl Genomes? Could the Registry entry refer to a specific > taxonomy iD and its corresponding entry in EG, despite EG using a different > taxonomy ID? > > I'd like to be able to export our microarray designs and data via DAS for > others to use (including EnsemblBacteria). This if for 16 or so species with > multiple strains thereof. > > > Just to say that I think we should get this as straight as we can; just to > state the obvious - EG is not trying to be deliberately complex here, it is > just that the concept of "one taxid == one species == one assembly series" > just > breaks down in bacteria. > > > I've brought in the three key people here on the EG side - Eugene (does the > web > side of this); Dan (main data production manager) and Paul Kersey (the EG PI) > - > some of them are on holiday now, but I suggest perhaps setting up a phone > conference (and/or Adam could you come for a visit?) to get this as straight > as > we can - I suspect there will both be short term fixes and more longer term > infrastructural fixes here. > > > > cheers > > Adam > > > On 17 Aug 2010, at 18:07, Andy Jenkinson wrote: > > Hi Adam, > > There are no coordinate systems yet as nobody has yet been brave enough to > start using DAS with bacteria in anger. Eugene at Ensembl Genomes will have > an interest in doing this, but they have issues with matching up their > species/strain names with the NCBI taxonomy upon which DAS's coordinates are > based. In essence if you will need to name the coordinate systems after which > they will need to be added to the registry. > > For example when Ensembl Genomes manage to do this, the coordinate systems > might end up looking like: > EB_1,Chromosome,Shigella flexneri 2a str. 301 > EB_1,Plasmid,Shigella flexneri 2a str. 301 > > This is for a specific shigella strain with taxonomy ID 198214. The authority > and version parts of the DAS coordinate system are somewhat arbitrarily > named, ideally they would be a standard that is used by the rest of the > community for interoperability purposes. > > What exactly is it you'd like to be able to do? How many species' are we > talking about? > > The reason I ask is that getting these coordinate systems into the DAS > registry does require some work. Some of this is on the registry's side, but > depending where your data come from there may be issues with identifying the > correct coordinate system details such that others can reuse them > meaningfully. To use the example above, Ensembl Genomes give the "301" strain > a different name from NCBI and use the taxonomy ID not for the strain but for > the parent species (Shigella flexneri). In fact the 2457T strain also uses > the same taxonomy ID, which isn't helpful. Given the number of species', this > adds up to a major headache. > > Cheers, > Andy > > On 17 Aug 2010, at 16:49, Adam Witney wrote: > > Hi, > > What would be the best approach to use DAS with bacterial genomes? I can't > seem to find any coordinate systems for these organisms in the Registry. > > Thanks for any advice > > Adam > _______________________________________________ > DAS mailing list > [email protected] > http://lists.open-bio.org/mailman/listinfo/das > > > > _______________________________________________ > DAS mailing list > [email protected] > http://lists.open-bio.org/mailman/listinfo/das > > _______________________________________________ > DAS mailing list > [email protected] > http://lists.open-bio.org/mailman/listinfo/das > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa <[email protected]> _______________________________________________ DAS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/das
