thanks to both Martin and Cei -- clearly I have to have the scale right,
and it is my hope to do a bit of analysis of the quality score distributions
and of decisionmaking using these -- positional effects are clearly of
interest.

On Wed, Apr 15, 2009 at 6:25 PM, Cei Abreu-Goodger <[email protected]> wrote:

> Hi Vincent,
>
> Are you taking into account that quality scores will tend to drop off
> towards the end of the run? I would probably restrict any sort of quality
> filtering to the first x bases of each read... From my experience, only a
> very small fraction of reads out of a "good" run would be removed due to
> general quality issues. Also, if your further pipeline is "quality-aware"
> (eg MAQ/bowtie for alignments) you can get away with not worrying initially
> about the quality of the reads. On the other hand, for some kinds of
> analysis I was dropping the quality scores and making plain fasta files. In
> these cases it would pay off to convert very low-quality bases to Ns, since
> I would get better coverage.
>
> Cheers,
>
> Cei
>
> Vincent Carey wrote:
>
>> i have scoured our archives and found little regarding role of solexa
>> quality
>> scores as reported in fastq outputs in short read filtering.
>>
>> my understanding is that a numerical score of -4 or greater indicates more
>> probability
>> mass on the called base than on any other.  in checking 1e6 reads on each
>> of
>> two lanes
>> i found the frequency of the event " fewer than three bases have score
>> less
>> than -4" to be
>> 4e-3 in one lane and 2e-3 in another.  in other words, filtering by
>> requiring no more than
>> two < -4 scores would take you from a million reads to about 2000-4000,
>> assuming i have
>> not taken a biased sample (i may have, just took the first 1e6 in fastq).
>>
>> is there any reason to regard a call with score < -4 to be much different
>> from an 'N'?
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered offic...{{dropped:14}}

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to