Hello everyone,
I'm attempting to assemble a genome using data from 2 lanes of the Illumina
HiSeq, totaling ~250M 104 bp paired end reads with an insert size of ~450bp.
We estimate the genome size just under 2Gbp. This would roughly compute to
30x coverage assuming it all maps to our genome.
I'm attempting to use Ray (v1.6.1-rc3) and am struggling to find a setting
that both proves to finish in reasonable amount of time and constructs a
reasonable assembly. I've noticed under the default settings the minimum
kmer coverage is set to one less than the peak coverage (which does not
appear to be the same as the max coverage and is often in 500-700 range) and
this leads to the exclusion of far too many Kmers (or so it appears), and
assembles an awful genome, with n50 in the 100's.
Below is my Kmer distribution:
Kmer coverage bin Frequency (k=61) Frequencey (k=31) 2 11422 12689 4
3461 5764 8 2570 5380 16 2191 4239 32 1753 3382 64 1130 2386 128 923
1804 256 727 1308 512 491 954 1024 345 684 2048 269 487 4096 199 375
8192 159 260 16384 111 188 32768 75 137 65536 55 92 131072 40 67 262144
28 46 524288 20 33 1048576 14 24 2097152 10 16 4194304 6 11 8388608 4 5
16777216 3 4 33554432 2 4 67108864 2 3 134217728 2 4 268435456 2 5
536870912 3 7 1073741824 3 1 2147483648 1 0 4294967296 0 0 8589934592 1
1
Does anyone have suggestions for Kmer values and coverage minimums to set?
Thanks for your help,
Walter
------------------------------------------------------------------------------
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks & Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more.
http://www.accelacomm.com/jaw/sfnl/114/51426210/
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users