Re: [Denovoassembler-users] Ray produced the same scaffolds and contigs

Sébastien Boisvert Mon, 05 Mar 2012 18:31:40 -0800

See my responses below.

On Mon, 2012-03-05 at 18:53 -0500, LIU wrote:
> Hi ,
> 
> 
> Thanks for your response.
> 
> On Tue, Mar 6, 2012 at 2:21 AM, Sébastien Boisvert
> <[email protected]> wrote:
>         1. Using a k-mer length of 71 will _presumably_ not work very
>         well
>         because of sequencing errors. First do a test run at k=31.
> Yes i also ran k=31. 
> It is the same case as k=71. 
> One more question about choice of kmer length.
> I was also told that longer kmer is supposed to produce more accurate
> assembly, while shorter ones are more prone to sequencing errors.
> I am confused. perhaps  i should open another ticket to ask this
> question. But i really appreciate your answer. 
>


Using longer k-mer makes the k-mers more unique.

Let's say that this is a read:

                                 *
TGTGTGGGTCAGTATGTAGTCCACCTGGAAATCTTCTTTTTCCAGATTTGCCCATCCTTCTTCGTCCTCTTCCCG


The '*' marks a sequencing error.

For 71-mers, the sliding window is:

                                 *
TGTGTGGGTCAGTATGTAGTCCACCTGGAAATCTTCTTTTTCCAGATTTGCCCATCCTTCTTCGTCCTCTTCCCG

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
 
So basically all the k-mers generated from that sliding window contain
the sequencing error.


For 31-mers, the sliding window is:

                                 *
TGTGTGGGTCAGTATGTAGTCCACCTGGAAATCTTCTTTTTCCAGATTTGCCCATCCTTCTTCGTCCTCTTCCCG

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk


So with 31-mers, you will get some erroneous k-mers and some genuine
k-mers.


> 
>         
>         
>         2. Are your interleaved files properly generated ?
>         
>         sequence1/1
>         sequence1/2
>         sequence2/1
>         sequence2/2
>         sequence3/1
>         sequence3/2
>         Yes, i think my sequences are correctlly interleaved. E.G.,
> >@AGRF-21_0011_FC64J74AAXX:2:1:1804:936#CCGACT/1
> TACATATATACATGATACATACATACATGATATATTCATATGTCACCTAAGGATGTATCATACATGATACATACATCCATGATACATACATACCG
> 
> 
> >@AGRF-21_0011_FC64J74AAXX:2:1:1804:936#CCGACT/2
> GATGTATGTATCATGTATGATACATCCTTAGGTGACATATGAATATATCATGTATGTATGTATCATGTATATATGTATAAATATGTAT
> 
> 
> >@AGRF-21_0011_FC64J74AAXX:2:1:1983:932#AATTAA/1
> TATATAGATAGATTTCA
> 
> 
> >@AGRF-21_0011_FC64J74AAXX:2:1:1983:932#AATTAA/2
> CTTTTTTTTTGTTTCAGTCCCCGTGCTTTCAAAATTGCCCGGGTTCAGTCCCTAAGTCGTTAAGTCCGTT
>  In fact, i also tried velvet. It produced different contigs and
> scaffolds. But of course Ray and Velvet may not be directly compared
> because of different scaffolding strategy (i do not know this, it's
> simply a guess).

This look ok.

BUt why is the second sequence shorter than the first one ?

Usually, Illumina sequencing produces 2 sequences of the same length for
each pair of sequences.

>          
>         
>         Do you get anything in LibraryStatistics.txt ?
> 
> 
> The LibraryStatixtics are
>    NumberOfPairedLibraries: 3
> 
> 
> LibraryNumber: 0
>  InputFormat: Interleaved,Paired
>  DetectionType: Automatic
>  File: /home/s4196896/mix_assembly/input/t15c15/gs1.shuffled.fasta.gz
>   NumberOfSequences: 248332323
>  Distribution: 31/Library0.txt
> 
> 
> LibraryNumber: 1
>  InputFormat: Interleaved,Paired
>  DetectionType: Automatic
>  File: /home/s4196896/mix_assembly/input/t15c15/gs3.shuffled.fasta.gz
>   NumberOfSequences: 405911176
>  Distribution: 31/Library1.txt
> 
> 
> LibraryNumber: 2
>  InputFormat: Interleaved,Paired
>  DetectionType: Automatic
>  File: /home/s4196896/mix_assembly/input/t15c15/gs2.shuffled.fasta.gz
>   NumberOfSequences: 234114234
>  Distribution: 31/Library2.txt 
> 

Is there anything in 31/Library0.txt,  31/Library1.txt,  31/Library2.txt


Can you provide the last 10 lines of SeedLengthDistribution.txt ?

> 
> Best Regards,
> Huanle
>         
>         On Thu, 2012-03-01 at 17:06 -0500, LIU wrote:
>         > Hi There,
>         >
>         > I have been using Ray to de novo assembly.
>         >
>         > The input reads are a mix of illumina pair-end reads (this
>         account for
>         > 90%), illumina single-end reads and 454 single end reads.
>         >
>         > The command i used is
>         > mpiexec -n 60 Ray \
>         >  -i \
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs1.shuffled.fasta.gz \
>         >  -i \
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs2.shuffled.fasta.gz \
>         >  -i \
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs3.shuffled.fasta.gz \
>         >  -s \
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs2.single.fasta.gz
>         \
>         >  -s \
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs3.single.fasta.gz
>         \
>         >  -s \
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs1.single.fasta.gz
>         \
>         >  -s \
>         >  /home/s4196896/mix_assembly/input/radseq1.seeds.fasta \
>         >  -s \
>         >  /home/s4196896/mix_assembly/input/radseq_v2.fasta \
>         >  -s \
>         >
>          /work1/s4196896/454_assembly/raw_reads/all_genomic_reads.short.fasta
>         > \
>         >  -s \
>         >
>          /work1/s4196896/454_assembly/raw_reads/all_genomic_reads.long.fasta \
>         >  -o \
>         >  71 \
>         >  -k \
>         >  71
>         >
>         > The output shows that scaffolds and contigs are the same
>         (same N50,
>         > total number of bases and number of sequences etc.).
>         >
>         > This confused me.
>         >
>         >
>         > I hope someone can help me out.
>         >
>         > Thanks in advance.
>         >
>         > Kind Regards,
>         > --
>         > Huanle
>         >
>         > School of biological Sciences, UQ, QLD, AU
>         
>         
>         
> 
> 
> 
> 
> -- 
> Huanle 
> 
> School of biological Sciences, UQ, QLD, AU
> 




------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Re: [Denovoassembler-users] Ray produced the same scaffolds and contigs

Reply via email to