On 18 Aug 2010, at 20:25, Lincoln Stein wrote:

> It sounds to me as though the DAS source metadata needs just one additional 
> field to indicate the strain, isolate or individual, making the hierarchy:
> 
> taxid -> strain/isolate/individual id -> assembly
> 
> (the taxid and the assembly are already in DAS) Is there a definitive 
> repository of isolate IDs that could be used for this purpose?

most of the sequenced strains already have strain specific taxid's. Although I 
don't know how NCBI add new taxid's so not sure how this will scale with the 
number of genomes that are currently being sequenced.

> As mentioned elsewhere in this thread, the problem of distinguishing an 
> individual from its taxon is not limited to bacteria. Does  the 1000 genomes 
> project use the assembly as a surrogate for isolate/individual? 

No it's not unique to bacteria, indeed I notice that Plasmodium vivax is 
already in the DAS registry, does this not suffer the same problems? i.e which 
strain are the coordinates referring to?

Adam


> Lincoln
> 
> On Wed, Aug 18, 2010 at 5:30 AM, Ewan Birney <[email protected]> wrote:
> 
> On 18 Aug 2010, at 09:52, Adam Witney wrote:
> 
> 
> Hi Andy,
> 
> Yes I am aware of the some of the idiosyncrasies of the Ensembl Genomes 
> naming conventions. But is there a reason that the DAS registry should be 
> constrained by Ensembl Genomes? Could the Registry entry refer to a specific 
> taxonomy iD and its corresponding entry in EG, despite EG using a different 
> taxonomy ID?
> 
> I'd like to be able to export our microarray designs and data via DAS for 
> others to use (including EnsemblBacteria). This if for 16 or so species with 
> multiple strains thereof.
> 
> 
> Just to say that I think we should get this as straight as we can; just to
> state the obvious - EG is not trying to be deliberately complex here, it is
> just that the concept of "one taxid == one species == one assembly series" 
> just
> breaks down in bacteria.
> 
> 
> I've brought in the three key people here on the EG side - Eugene (does the 
> web
> side of this); Dan (main data production manager) and Paul Kersey (the EG PI) 
> -
> some of them are on holiday now, but I suggest perhaps setting up a phone
> conference (and/or Adam could you come for a visit?) to get this as straight 
> as
> we can - I suspect there will both be short term fixes and more longer term
> infrastructural fixes here.
> 
> 
> 
> cheers
> 
> Adam
> 
> 
> On 17 Aug 2010, at 18:07, Andy Jenkinson wrote:
> 
> Hi Adam,
> 
> There are no coordinate systems yet as nobody has yet been brave enough to 
> start using DAS with bacteria in anger. Eugene at Ensembl Genomes will have 
> an interest in doing this, but they have issues with matching up their 
> species/strain names with the NCBI taxonomy upon which DAS's coordinates are 
> based. In essence if you will need to name the coordinate systems after which 
> they will need to be added to the registry.
> 
> For example when Ensembl Genomes manage to do this, the coordinate systems 
> might end up looking like:
> EB_1,Chromosome,Shigella flexneri 2a str. 301
> EB_1,Plasmid,Shigella flexneri 2a str. 301
> 
> This is for a specific shigella strain with taxonomy ID 198214. The authority 
> and version parts of the DAS coordinate system are somewhat arbitrarily 
> named, ideally they would be a standard that is used by the rest of the 
> community for interoperability purposes.
> 
> What exactly is it you'd like to be able to do? How many species' are we 
> talking about?
> 
> The reason I ask is that getting these coordinate systems into the DAS 
> registry does require some work. Some of this is on the registry's side, but 
> depending where your data come from there may be issues with identifying the 
> correct coordinate system details such that others can reuse them 
> meaningfully. To use the example above, Ensembl Genomes give the "301" strain 
> a different name from NCBI and use the taxonomy ID not for the strain but for 
> the parent species (Shigella flexneri). In fact the 2457T strain also uses 
> the same taxonomy ID, which isn't helpful. Given the number of species', this 
> adds up to a major headache.
> 
> Cheers,
> Andy
> 
> On 17 Aug 2010, at 16:49, Adam Witney wrote:
> 
> Hi,
> 
> What would be the best approach to use DAS with bacterial genomes? I can't 
> seem to find any coordinate systems for these organisms in the Registry.
> 
> Thanks for any advice
> 
> Adam
> _______________________________________________
> DAS mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/das
> 
> 
> 
> _______________________________________________
> DAS mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/das
> 
> _______________________________________________
> DAS mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/das
> 
> 
> 
> -- 
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <[email protected]>


_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das

Reply via email to