On 23 Jul 2014, at 10:58, Dan Bolser <[email protected]> wrote: > > I like these comments: [...] > > "Hopefully 2 Gbp [per chromosome - JM] suffices for most species that > people are > interested in." - John Marshall. [Somewhat taken out of context, see below - > JM] > > As already mentioned, bread wheat, which provides ~20% of all the > calories consumed by humans globally, has 6 of it's 27 chromosomes > *assembled* to a length of greater than 537 Mbp [1]...
I guess your point is that humans are therefore interested in bread wheat, but I would note from your spreadsheet that 2B's 696 Mbp is substantially less than 2 Gbp -- so I'm not sure what your point is after all. > I won't mention Paris japonica ;-) If I'm reading wikipedia correctly, 150 billion bp in 40 chromosomes implies at least one chromosome of at least 3.75 Gbp. So that would indeed be informatically difficult to represent in signed 32-bit arithmetic. Perhaps when this plant is assembled, the researchers will need to artificially subdivide the chromosomes in some way, or the formats will have moved on by then. > [1] > https://docs.google.com/spreadsheet/ccc?key=0Aqs5UFlky_s6dDlmVVFib3FoSm1tS1JIQzMxY0RVb2c It appears that perhaps there is in fact a convention of subdividing large plant chromosomes into long and short arms for reference purposes: your spreadsheet lists e.g. 2BL and 2BS separately. One of the samtools bug reports about large chromosomes was about barley [1], with seven chromosomes 1H...7H. I downloaded Ensembl's barley genome while attempting to reproduce it, but was thoroughly confused by the apparently twenty chromosomes there: see http://plants.ensembl.org/Hordeum_vulgare/Location/Genome which seems to have mostly individual chromosome arms in 2HL, 2HS etc but also enormous 1-7 chromosomes. Dan, I think you just volunteered to explain what on earth is going on here! :-) John > On 18 October 2013 11:50, John Marshall <[email protected]> wrote: >> [snip] >> Hopefully 2 Gbp suffices for most species that people are interested in. In >> principle the existing file formats could be compatibly pushed to 4 Gbp, but >> this would require great care with getting signed v. unsigned arithmetic >> correct in the implementations so would likely not really be worthwhile. [1] https://github.com/samtools/samtools/issues/116 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
