Ray is not really good with homopolymers -- the most abundant sequencing
errors in 454 data.

On Fri, 2012-01-27 at 13:44 -0500, Sam wrote:
> Hi Sebastien and denovoassemblers, I recently started experimenting
> with Ray, it seems pretty cool and orders of magnitude more memory
> efficient than other assemblers I have used in the past. Unfortunately
> the results I am getting are a bit odd. I am working on assembling a
> bacterial genome, and have assembled some large (max size = 600kb)
> contigs using 454SE reads and Newbler. I have ~20x coverage, and
> optical mapping results (basically a fancy restriction digest map)
> from the same genome suggest that my large contigs are quite good
> (~85% of the genome is covered by the large contigs and there is no
> evidence of misassembly). I suspect that gaps remain in my assembly
> due to small repetitive elements, and in fact a coverage analysis
> suggests that some contigs have as many as 8 copies in the genome! The
> documentation for Ray suggests that it does quite well with resolving
> these repetitive elements so I was excited to try it on my data.
> 
> I am using Ray 1.7.0.
> 
> Initially I tried to run Ray on the assembled contigs only to see if
> it would deal with any of the overlaps identified through optical
> mapping.
> 
> The assembled contigs have these stats:
>                 numberOfContigs   = 85;
>                 numberOfBases     = 6302194;
> 
>                 avgContigSize     = 74143;
>                 N50ContigSize     = 133881;
>                 largestContigSize = 613481;
> 
>                 Q40PlusBases      = 6290676, 99.82%;
>                 Q39MinusBases     = 11518, 0.18%;
> 
> 
> 
> This attempt quickly generated an assembler panic:
> 
> $ Ray -s AllContigs.fasta -out test1
> ------------------------------------------------------------------------------------
> ***
> Step: K-mer counting
> Date: Fri Jan 27 10:07:32 2012
> Elapsed time: 5 seconds
> Since beginning: 6 seconds
> ***
> 
> 
> Rank 0 has 16364 k-mers (completed)
> 
> 
> Rank 0: the minimum coverage is 2
> Rank 0: the peak coverage is 2
> Rank 0: Assembler panic: no peak observed in the k-mer coverage distribution.
> Rank 0: to deal with the sequencing error rate, try to lower the k-mer
> length (-k)
> Rank 0: sent 10431 messages, received 10430 messages.
> ------------------------------------------------------------------------------------
> 
> I then tried adding the original set of 454 reads as well as the contigs:
> Ray -s AllContigs.fasta -s Sample1_Reads.fasta -o test2
> 
> This allowed Ray to run, however the results are not quite what I expected.
> OutputNumbers.txt:
> Contigs >= 100 nt
>  Number: 2847
>  Total length: 6101858
>  Average: 2143
>  N50: 4105
>  Median: 1236
>  Largest: 29655
> Contigs >= 500 nt
>  Number: 2057
>  Total length: 5896767
>  Average: 2866
>  N50: 4349
>  Median: 1876
>  Largest: 29655
> Scaffolds >= 100 nt
>  Number: 2847
>  Total length: 6101858
>  Average: 2143
>  N50: 4105
>  Median: 1236
>  Largest: 29655
> Scaffolds >= 500 nt
>  Number: 2057
>  Total length: 5896767
>  Average: 2866
>  N50: 4349
>  Median: 1876
>  Largest: 29655
> 
> So, I'm not exactly using Ray for the designed purpose, but I am
> curious about why it is breaking apart my large contigs and producing
> an assembly with less assembled bases than I originally fed it. Any
> suggestions for having Ray deal with assembly of the repetitive
> regions without breaking up these large contigs would be most welcome!
> 
> thanks,
> 
> Sam
> 
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Denovoassembler-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users



------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to