[Denovoassembler-users] RE : RE : RE : RE : Ray assembly kmer coverage question

Sébastien Boisvert Thu, 21 Jul 2011 08:01:15 -0700

Hello,

> ________________________________________
> De : Walter Eckalbar [[email protected]]
> Date d'envoi : 20 juillet 2011 16:30
> À : Sébastien Boisvert
> Objet : Re: RE : [Denovoassembler-users] RE : RE : Ray assembly kmer coverage 
> question
> 
> Hi Sébastien,
> 
> I got a similar distribution for k=21 as I did for k=31.


Can you post it on pastebin and link here ?

>  So, I'm going to try some filters now then rerun with k=21.  Which lead to 
> me a question that I don't see adressed in your manual: Do you throw out 
> reads with "N"s, or how is that addressed?

Unknown nucleotides at the ends of reads are removed and unknown nucleotides 
inside reads
are converted to A.

Added this in the manual: 
https://github.com/sebhtml/ray/commit/b76e9dde17fb06b5a8541f0d3c6338f5e0e70c95

> Should I filter those out?
> 

It will just reduce the number of erroneous k-mers, nothing more.

   Sébastien

> Walter
> 
> 
> 2011/7/20 Sébastien Boisvert 
> <[email protected]<mailto:[email protected]>>
> Exactly !
> 
> 
> 
>> ________________________________________
>> De : Adrian Platts [[email protected]<mailto:[email protected]>]
>> Date d'envoi : 20 juillet 2011 12:53
>> À : Sébastien Boisvert
>> Cc : Walter Eckalbar; 
>> [email protected]<mailto:[email protected]>
>> Objet : Re: [Denovoassembler-users] RE : RE : Ray assembly kmer coverage 
>> question
>>
>>>
>>> That is the first thing you should do I think. k=21 will pick up more 
>>> redundant k-mers than k=31 or k=61.
>>>
>>> Basically, this is why:
>>>
>>>
>>> If the sequencing error rate is > 4.7% (1/21), then mostly all k-mers will 
>>> be unique and bad for k=21.
>>> If the sequencing error rate is > 3.2% (1/31), then mostly all k-mers will 
>>> be unique and bad for k=31.
>>> If the sequencing error rate is > 1.6% (1/61), then mostly all k-mers will 
>>> be unique and bad for k=61.
>>>
>>> I believe Illumina HiSeq TruSeq sequencing error rate varies between 0 and 
>>> 2 %. You mileage may vary however depending on the quality of DNA and 
>>> library preparation (nicks
>>> in DNA for instance during the library preparation).
>>>
>>
>> I believe that most people using long Kmers in assembly have been 3' 
>> trimming/rejecting reads where they fall below (at least) Q=31 on a per read 
>> basis and
>> with the newer Illumina chemistries at Q=35.  I don't know of anyone who is 
>> really putting raw FASTA read data directly into the assemblers - certainly 
>> not when using long Kmers
>> for exactly the reason above.
>>
>> I suppose FASTQ data perhaps but the way the FASTQ data is dealt with can be 
>> pretty different between assemblers so it can have mixed results I think.
>>
>> Adrian
>>
>>
> 
> Sébastien
> 
> 

                                                     Sébastien

------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can 
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

[Denovoassembler-users] RE : RE : RE : RE : Ray assembly kmer coverage question

Reply via email to