Hi Sebastien and denovoassemblers, I recently started experimenting
with Ray, it seems pretty cool and orders of magnitude more memory
efficient than other assemblers I have used in the past. Unfortunately
the results I am getting are a bit odd. I am working on assembling a
bacterial genome, and have assembled some large (max size = 600kb)
contigs using 454SE reads and Newbler. I have ~20x coverage, and
optical mapping results (basically a fancy restriction digest map)
from the same genome suggest that my large contigs are quite good
(~85% of the genome is covered by the large contigs and there is no
evidence of misassembly). I suspect that gaps remain in my assembly
due to small repetitive elements, and in fact a coverage analysis
suggests that some contigs have as many as 8 copies in the genome! The
documentation for Ray suggests that it does quite well with resolving
these repetitive elements so I was excited to try it on my data.
I am using Ray 1.7.0.
Initially I tried to run Ray on the assembled contigs only to see if
it would deal with any of the overlaps identified through optical
mapping.
The assembled contigs have these stats:
numberOfContigs = 85;
numberOfBases = 6302194;
avgContigSize = 74143;
N50ContigSize = 133881;
largestContigSize = 613481;
Q40PlusBases = 6290676, 99.82%;
Q39MinusBases = 11518, 0.18%;
This attempt quickly generated an assembler panic:
$ Ray -s AllContigs.fasta -out test1
------------------------------------------------------------------------------------
***
Step: K-mer counting
Date: Fri Jan 27 10:07:32 2012
Elapsed time: 5 seconds
Since beginning: 6 seconds
***
Rank 0 has 16364 k-mers (completed)
Rank 0: the minimum coverage is 2
Rank 0: the peak coverage is 2
Rank 0: Assembler panic: no peak observed in the k-mer coverage distribution.
Rank 0: to deal with the sequencing error rate, try to lower the k-mer
length (-k)
Rank 0: sent 10431 messages, received 10430 messages.
------------------------------------------------------------------------------------
I then tried adding the original set of 454 reads as well as the contigs:
Ray -s AllContigs.fasta -s Sample1_Reads.fasta -o test2
This allowed Ray to run, however the results are not quite what I expected.
OutputNumbers.txt:
Contigs >= 100 nt
Number: 2847
Total length: 6101858
Average: 2143
N50: 4105
Median: 1236
Largest: 29655
Contigs >= 500 nt
Number: 2057
Total length: 5896767
Average: 2866
N50: 4349
Median: 1876
Largest: 29655
Scaffolds >= 100 nt
Number: 2847
Total length: 6101858
Average: 2143
N50: 4105
Median: 1236
Largest: 29655
Scaffolds >= 500 nt
Number: 2057
Total length: 5896767
Average: 2866
N50: 4349
Median: 1876
Largest: 29655
So, I'm not exactly using Ray for the designed purpose, but I am
curious about why it is breaking apart my large contigs and producing
an assembly with less assembled bases than I originally fed it. Any
suggestions for having Ray deal with assembly of the repetitive
regions without breaking up these large contigs would be most welcome!
thanks,
Sam
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users