Ray is not really good with homopolymers -- the most abundant sequencing errors in 454 data.
On Fri, 2012-01-27 at 13:44 -0500, Sam wrote: > Hi Sebastien and denovoassemblers, I recently started experimenting > with Ray, it seems pretty cool and orders of magnitude more memory > efficient than other assemblers I have used in the past. Unfortunately > the results I am getting are a bit odd. I am working on assembling a > bacterial genome, and have assembled some large (max size = 600kb) > contigs using 454SE reads and Newbler. I have ~20x coverage, and > optical mapping results (basically a fancy restriction digest map) > from the same genome suggest that my large contigs are quite good > (~85% of the genome is covered by the large contigs and there is no > evidence of misassembly). I suspect that gaps remain in my assembly > due to small repetitive elements, and in fact a coverage analysis > suggests that some contigs have as many as 8 copies in the genome! The > documentation for Ray suggests that it does quite well with resolving > these repetitive elements so I was excited to try it on my data. > > I am using Ray 1.7.0. > > Initially I tried to run Ray on the assembled contigs only to see if > it would deal with any of the overlaps identified through optical > mapping. > > The assembled contigs have these stats: > numberOfContigs = 85; > numberOfBases = 6302194; > > avgContigSize = 74143; > N50ContigSize = 133881; > largestContigSize = 613481; > > Q40PlusBases = 6290676, 99.82%; > Q39MinusBases = 11518, 0.18%; > > > > This attempt quickly generated an assembler panic: > > $ Ray -s AllContigs.fasta -out test1 > ------------------------------------------------------------------------------------ > *** > Step: K-mer counting > Date: Fri Jan 27 10:07:32 2012 > Elapsed time: 5 seconds > Since beginning: 6 seconds > *** > > > Rank 0 has 16364 k-mers (completed) > > > Rank 0: the minimum coverage is 2 > Rank 0: the peak coverage is 2 > Rank 0: Assembler panic: no peak observed in the k-mer coverage distribution. > Rank 0: to deal with the sequencing error rate, try to lower the k-mer > length (-k) > Rank 0: sent 10431 messages, received 10430 messages. > ------------------------------------------------------------------------------------ > > I then tried adding the original set of 454 reads as well as the contigs: > Ray -s AllContigs.fasta -s Sample1_Reads.fasta -o test2 > > This allowed Ray to run, however the results are not quite what I expected. > OutputNumbers.txt: > Contigs >= 100 nt > Number: 2847 > Total length: 6101858 > Average: 2143 > N50: 4105 > Median: 1236 > Largest: 29655 > Contigs >= 500 nt > Number: 2057 > Total length: 5896767 > Average: 2866 > N50: 4349 > Median: 1876 > Largest: 29655 > Scaffolds >= 100 nt > Number: 2847 > Total length: 6101858 > Average: 2143 > N50: 4105 > Median: 1236 > Largest: 29655 > Scaffolds >= 500 nt > Number: 2057 > Total length: 5896767 > Average: 2866 > N50: 4349 > Median: 1876 > Largest: 29655 > > So, I'm not exactly using Ray for the designed purpose, but I am > curious about why it is breaking apart my large contigs and producing > an assembly with less assembled bases than I originally fed it. Any > suggestions for having Ray deal with assembly of the repetitive > regions without breaking up these large contigs would be most welcome! > > thanks, > > Sam > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Denovoassembler-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
