My answers are below. On Mon, 2012-03-05 at 23:32 -0500, Egon Ozer wrote: > What does that mean? That if you have 20 k-mers covering a base position, 19 > of which have a T > at that position and 1 with an A at that position, that the consensus will > have an A at that position > if the k-mer was with the A was lexicographically the lowest?
No, it depends. I have read the code path for that and the most covered is selected. If they have an equal number of k-mer observations, then the shortest is selected. See https://github.com/sebhtml/ray/blob/master/code/plugin_SeedExtender/BubbleTool.cpp (if you want to check the code). > Or am I thinking too simplistically, > or just misunderstanding (a distinct possibility as I am a biologist, not a > computer scientist)? > Maybe one gets selected over the other because the read pairs are better looking. > Just FYI, I had Velvet take a crack at the same data and it almost seemed > worse. But de novo assemblers are not tools to produce variation files. I know ALLPATHS does that, but the file format is in development. Since you want to compare 2 closely related species without any reference, one thing you could try is to generate an assembly for species A and an assembly for species B. Then, you can generate variations for A and B by mapping reads to the assembly A. You can then repeat the same by mapping reads to assembly B. File formats like SAM and VCF are really superior to deal with variations. I think assemblers (like Ray or Velvet) are just one part of your story. Otherwise, you have to inspect the AMOS file manually which sounds exciting visually but could take a while. > Just a cursory > glance through the assembly revealed a few base positions that were 100% > covered with one base > (to a depth of ~ 20 in the two cases I saw), but the consensus was another > base(!). Unsettling... > This sounds strange ;) > Thanks for getting back to me. Glad you like the figures! > Did you try hawkeye too ? I find it more responsive although it is less visually appealing. > - Egon > > > On Mar 5, 2012, at 7:33 PM, Sébastien Boisvert wrote: > > > I think that Ray will choose the lexicographically-lower k-mer. > > > > Nice figures by the way !! > > > > > > On Mon, 2012-03-05 at 18:16 -0500, Egon Ozer wrote: > >> I'm trying to call SNPs in two very closely related bacterial > >> organisms (same genus and species) without a good reference genome > >> using Illumina sequencing. I assembled each of the genome sequences > >> de novo using Ray 1.7, then used Novoalign to align the reads from > >> genome 2 onto the assembly of genome 1 and called SNPs with mpileup. > >> I also did the reverse (reads of genome 1 onto assembly of genome 2) > >> and compared SNP calls to try to increase specificity of calls. What > >> I found was a number of heterogeneous (~50/50) SNPs with deep coverage > >> that I think represent co-assembly of repeat regions. What was odd > >> was that there were also a number of SNPs that were uniform in one > >> alignment (genome 2 to genome 1, for example), but no SNP call in the > >> other direction (genome 1 to genome 2). Long story short, I took a > >> look at the Ray assembly AMOS files in Tablet and found that these > >> most often represented positions that look like they should have been > >> called a base other than what they ultimately were (i.e. the position > >> was covered by 29 T's and 14 G's, but Ray called a G at that > >> position). > >> > >> I'll include screen shots of 3 of these errors. > >> > >> I know Ray doesn't take base qualities into account during assembly, > >> so any thoughts on why it may be making a decision to call one base > >> over another in these situations since it doesn't seem to be taking > >> base predominance into account here? > >> > >> Thanks, > >> > >> - Egon > >> > >> > >> > > > > > ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
