Reading here (sorry, not on the 'live' site yet, due to a mistake our end): http://test.plants.ensembl.org/Hordeum_vulgare/Info/Annotation/#assembly
~2.6 million sequenced contigs were generated using whole-genome shotgun sequencing (WGS). ~138,000 of these are assigned to specific chromosomal positions. These are shown in the browser on chromosomes labeled 1-7. An additional 355,000 WGS contigs could be assigned to a specific chromosome arm, but not to a more specific location. These are shown in the browser on chromosomes labeled with an "H." (e.g. 1H, 3HL). "HL" and "HS" refer to chromosome long and short arms respectively. I was mentioning the index issue (537 Mbp), which as I'm sure you know is fixed with the alternative index format. However, fixed or not, I found the comments amusing :-) Cheers, Dan. On 23 July 2014 11:48, John Marshall <[email protected]> wrote: > On 23 Jul 2014, at 10:58, Dan Bolser <[email protected]> wrote: >> >> I like these comments: > [...] >> >> "Hopefully 2 Gbp [per chromosome - JM] suffices for most species that >> people are >> interested in." - John Marshall. [Somewhat taken out of context, see below >> - JM] >> >> As already mentioned, bread wheat, which provides ~20% of all the >> calories consumed by humans globally, has 6 of it's 27 chromosomes >> *assembled* to a length of greater than 537 Mbp [1]... > > I guess your point is that humans are therefore interested in bread wheat, > but I would note from your spreadsheet that 2B's 696 Mbp is substantially > less than 2 Gbp -- so I'm not sure what your point is after all. > >> I won't mention Paris japonica ;-) > > If I'm reading wikipedia correctly, 150 billion bp in 40 chromosomes implies > at least one chromosome of at least 3.75 Gbp. So that would indeed be > informatically difficult to represent in signed 32-bit arithmetic. Perhaps > when this plant is assembled, the researchers will need to artificially > subdivide the chromosomes in some way, or the formats will have moved on by > then. > >> [1] >> https://docs.google.com/spreadsheet/ccc?key=0Aqs5UFlky_s6dDlmVVFib3FoSm1tS1JIQzMxY0RVb2c > > It appears that perhaps there is in fact a convention of subdividing large > plant chromosomes into long and short arms for reference purposes: your > spreadsheet lists e.g. 2BL and 2BS separately. > > One of the samtools bug reports about large chromosomes was about barley [1], > with seven chromosomes 1H...7H. I downloaded Ensembl's barley genome while > attempting to reproduce it, but was thoroughly confused by the apparently > twenty chromosomes there: see > http://plants.ensembl.org/Hordeum_vulgare/Location/Genome which seems to have > mostly individual chromosome arms in 2HL, 2HS etc but also enormous 1-7 > chromosomes. Dan, I think you just volunteered to explain what on earth is > going on here! :-) > > John > > >> On 18 October 2013 11:50, John Marshall <[email protected]> wrote: >>> [snip] >>> Hopefully 2 Gbp suffices for most species that people are interested in. >>> In principle the existing file formats could be compatibly pushed to 4 Gbp, >>> but this would require great care with getting signed v. unsigned >>> arithmetic correct in the implementations so would likely not really be >>> worthwhile. > > [1] https://github.com/samtools/samtools/issues/116 > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
