Yes, this can be fun. I can also foresee similar problems in vertebrate genomes. On one hand, some assemblies correspond to more than one single individual: chimeric assemblies, haplotypes, etc. On the other hand, you can get more than one genome per individual (i.e. cancer genomes).
Javier On Thursday 19 Aug 2010 10:56:49 Ewan Birney wrote: > Just to repeat : > > I always think this should be easy and then I get educated by Paul: > > I thikn each time one thinks about "just moving it down a > level" (eg, to strain) there are submitted > cases in which two people have submitted assemblies with the same > "strain tax id" but actually > clearly arent (eg, there is a big insertion of something). The whole > thing keeps moving down > a notch. > > The right thing here is to assign tracking idenitifers to assembly > series independently of > the strain assignments, and track assemblies separately (but obviously > with relationships) > to strains. > > Paul has met most (?all) of the use cases and understands this > better than me. I think > we should wait for Paul to weigh in here - it's just always a bit more > complicated than you > think ;) > > On 19 Aug 2010, at 00:10, Andy Jenkinson wrote: > > On 18 Aug 2010, at 20:47, Adam Witney wrote: > >>> As mentioned elsewhere in this thread, the problem of > >>> distinguishing an individual from its taxon is not limited to > >>> bacteria. Does the 1000 genomes project use the assembly as a > >>> surrogate for isolate/individual? > >> > >> No it's not unique to bacteria, indeed I notice that Plasmodium > >> vivax is already in the DAS registry, does this not suffer the same > >> problems? i.e which strain are the coordinates referring to? > > > > So far a coordinate system always refers to the species or strain > > identified by its taxonomy ID. As you say, strains DO have their own > > NCBI taxonomy ID. It may be that this is not the case for a strain > > that someone wants to annotate, but I have yet to see an actual > > example. There is the wider question of how to handle individuals > > though. I can't comment on how 1000 genomes do this as I've only > > seen these data expressed as variations annotated upon the reference > > assembly, but my feeling is that if annotations of an individual > > were needed then it could/would be done using the assembly paradigm > > as a surrogate. > > _______________________________________________ > > DAS mailing list > > [email protected] > > http://lists.open-bio.org/mailman/listinfo/das > > _______________________________________________ > DAS mailing list > [email protected] > http://lists.open-bio.org/mailman/listinfo/das -- Javier Herrero, PhD Ensembl Compara Project Leader European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge - CB10 1SD - UK _______________________________________________ DAS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/das
