I have had some success using MUSKET and FLASH upstream of Ray with this
sort of data. MUSKET uses kmer frequencies to trim/correct data; FLASH
merges paired ends
There is a new option on MUSKET that makes this easier than what I
described online; you can now feed it the two reads files and get two
output files. Michael Schatz has suggested that FLASH before MUSKET might
make more sense than my MUSKET then FLASH order. If you do use MUSKET
first, don't trim; FLASH doesn't like reads of varying length.
The Trouble With
FASTQ<http://omicsomics.blogspot.com/2012/12/the-trouble-with-fastq.html>
MUSKET Then FLASH, vice versa, or just COPE with
it?<http://omicsomics.blogspot.com/2012/12/musket-then-flash-vice-versa-or-just.html>
However, I haven't extensively tested all of these & it may well be that in
some cases Ray would do better on unprocessed data (at least skipping
FLASH). Still, given your insert characteristics if FLASH can't merge many
of the reads, something odd is going on.
You might also try mapping your reads back to the contigs with BWA and see
if many of them have soft-clipping in the CIGAR line -- that might indicate
poor accuracy in the tails of your reads, and perhaps they need to be
trimmed aggressively. Looking at the quality scores vs. position may also
suggest where (and if) noise is dominating the data.
On Mon, Jan 14, 2013 at 12:26 PM, Sébastien Boisvert <
[email protected]> wrote:
> On Sun, Jan 13, 2013 at 05:05:55PM -0500, Adrian Pelin wrote:
> > Well, this just goes to show me how important it is to always know what
> > you are talking about. My apologies. I have been introduced to Ray about
> > 2 years ago and thought I remembered somewhere SOAPdenovo mentioned. Now
> > I see that it was actually OpenAssembler that was been mentioned.
> >
> > Anyway, I was wondering if I could have some advice on a dataset I am
> > working on. So here is some info:
> > - We sequenced an organism that has an estimated genome size of 6 to 12
> MB.
> > - The technology was illumina, MiSeq 250 bp x 2 Paired End with a
> > fragment/insert of 500.
> > - The DNA quality was not super, so the fragment size ended up being 300
> > actually, and our 250 x 2 reads overlap a lot.
> > - The nucleotide coverage of some already assembled contigs is 150 - 300
> > area.
> >
> > I have already tried an assembly with Ray. I have used kmer 213, but
>
> That is very high.
>
> > used the -short option as opposed to shortpaired.
>
> Ray does not have a -short option, I am not sure I understand what you
> mean.
>
> > Contigs >= 500 nt
> > Number: 14844
> > Total length: 18556037
> > Average: 1250
> > N50: 1372
> > Median: 914
> > Largest: 61137
> >
> > Any advice? I tried trimming reads to 50bp from 3' end and assembling,
> > and ray didn't like that. I got the largest scaffold 7kb. Kmer 45.
>
> Can you provide your command line ?
>
> > Adrian
> >
> >
> >
> > On 1/13/2013 4:24 PM, Sébastien Boisvert wrote:cat
> > > On Sun, Jan 13, 2013 at 12:17:21PM -0500, Adrian Pelin wrote:
> > >> Hello,
> > >>
> > >> I would like to know, since Ray is based on SOAPdenovo to my
> > >> understanding,
> > > Ray is not based on SOAPdenovo.
> > >
> > >> is there a version of Ray that is upcoming that will be
> > >> based on SOAPdenovo2?
> > > Well no since Ray has never been based on SOAPdenovo or anything else.
> > >
> > >> Adrian
> > >>
> > >>
> ------------------------------------------------------------------------------
> > >> _______________________________________________
> > >> Denovoassembler-users mailing list
> > >> [email protected]
> > >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
> >
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122412
> _______________________________________________
> Denovoassembler-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users