thanks to both Martin and Cei -- clearly I have to have the scale right, and it is my hope to do a bit of analysis of the quality score distributions and of decisionmaking using these -- positional effects are clearly of interest.
On Wed, Apr 15, 2009 at 6:25 PM, Cei Abreu-Goodger <[email protected]> wrote: > Hi Vincent, > > Are you taking into account that quality scores will tend to drop off > towards the end of the run? I would probably restrict any sort of quality > filtering to the first x bases of each read... From my experience, only a > very small fraction of reads out of a "good" run would be removed due to > general quality issues. Also, if your further pipeline is "quality-aware" > (eg MAQ/bowtie for alignments) you can get away with not worrying initially > about the quality of the reads. On the other hand, for some kinds of > analysis I was dropping the quality scores and making plain fasta files. In > these cases it would pay off to convert very low-quality bases to Ns, since > I would get better coverage. > > Cheers, > > Cei > > Vincent Carey wrote: > >> i have scoured our archives and found little regarding role of solexa >> quality >> scores as reported in fastq outputs in short read filtering. >> >> my understanding is that a numerical score of -4 or greater indicates more >> probability >> mass on the called base than on any other. in checking 1e6 reads on each >> of >> two lanes >> i found the frequency of the event " fewer than three bases have score >> less >> than -4" to be >> 4e-3 in one lane and 2e-3 in another. in other words, filtering by >> requiring no more than >> two < -4 scores would take you from a million reads to about 2000-4000, >> assuming i have >> not taken a biased sample (i may have, just took the first 1e6 in fastq). >> >> is there any reason to regard a call with score < -4 to be much different >> from an 'N'? >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered offic...{{dropped:14}} _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
