On 23 Jul 2014, at 10:58, Dan Bolser <[email protected]> wrote:
> 
> I like these comments:
[...]
> 
>        "Hopefully 2 Gbp [per chromosome - JM] suffices for most species that 
> people are
> interested in." - John Marshall.  [Somewhat taken out of context, see below - 
> JM]
> 
> As already mentioned, bread wheat, which provides ~20% of all the
> calories consumed by humans globally, has 6 of it's 27 chromosomes
> *assembled* to a length of greater than 537 Mbp [1]...

I guess your point is that humans are therefore interested in bread wheat, but 
I would note from your spreadsheet that 2B's 696 Mbp is substantially less than 
2 Gbp -- so I'm not sure what your point is after all.

> I won't mention Paris japonica ;-)

If I'm reading wikipedia correctly, 150 billion bp in 40 chromosomes implies at 
least one chromosome of at least 3.75 Gbp.  So that would indeed be 
informatically difficult to represent in signed 32-bit arithmetic.  Perhaps 
when this plant is assembled, the researchers will need to artificially 
subdivide the chromosomes in some way, or the formats will have moved on by 
then.

> [1] 
> https://docs.google.com/spreadsheet/ccc?key=0Aqs5UFlky_s6dDlmVVFib3FoSm1tS1JIQzMxY0RVb2c

It appears that perhaps there is in fact a convention of subdividing large 
plant chromosomes into long and short arms for reference purposes: your 
spreadsheet lists e.g. 2BL and 2BS separately.

One of the samtools bug reports about large chromosomes was about barley [1], 
with seven chromosomes 1H...7H.  I downloaded Ensembl's barley genome while 
attempting to reproduce it, but was thoroughly confused by the apparently 
twenty chromosomes there: see 
http://plants.ensembl.org/Hordeum_vulgare/Location/Genome which seems to have 
mostly individual chromosome arms in 2HL, 2HS etc but also enormous 1-7 
chromosomes.  Dan, I think you just volunteered to explain what on earth is 
going on here! :-)

    John


> On 18 October 2013 11:50, John Marshall <[email protected]> wrote:
>> [snip]
>> Hopefully 2 Gbp suffices for most species that people are interested in.  In 
>> principle the existing file formats could be compatibly pushed to 4 Gbp, but 
>> this would require great care with getting signed v. unsigned arithmetic 
>> correct in the implementations so would likely not really be worthwhile.

[1] https://github.com/samtools/samtools/issues/116

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to