Re: [Bioc-sig-seq] filtering by adapters in QA report

Marcus Davy Fri, 25 Mar 2011 21:00:04 -0700

Hi Robert,
just to add to the discussion, it was not initially obvious to me that the
ShortRead QA report can read either from disk
or from a ShortReadQ object within R. This at least provides the flexibility
to filter a ShortReadQ object using
trimLRPatterns/vmatchPattern/narrow etc and then run a QA filtered report to
get more meaningful plots.


I agree it would be a nice feature to be able to specify some adapter
sequences to filter in a qa() call itself, or potentially
select parts of the report of interest.
There will be cases that will test this proposed functionality, especially
around partial adapter sequence
and the number of mismatches to allow for. I recently came across a
synthetic construct (~20 bases) in an
illumina experiment which was the first half of an adapter with the addition
of a single random DNA base at
the 5' start, so the partial adapter effectively started at cycle position 2
of the subjects. Using Biostrings
trimLRPatterns may not identify this pattern and dynamically trim or filter
(utilizing ranges coordinates)
unless the random base is added to the start of the pattern and at least one
mismatch is allowed,
whereas using a vmatchPattern approach to filter would work.

Marcus


On Sat, Mar 26, 2011 at 5:41 AM, Robert Gentleman <rgent...@gmail.com>wrote:

> On Fri, Mar 25, 2011 at 8:59 AM, Martin Morgan <mtmor...@fhcrc.org> wrote:
> > On 03/24/2011 10:56 AM, Michael Lawrence wrote:
> >>
> >> Hi Martin,
> >>
> >> It would be nice if the ShortRead QA report could somehow filter out the
> >> adapter contamination before generating the rest of its plots, since
> those
> >> plots are pretty meaningless if there are adapters present.
> >>
> >> Not sure how to handle this filtering in general. That is, what if
> someone
> >> then wants to see plots with only the "high quality" reads after the
> >> quality
> >> plots. It gets complicated. ShortRead has a nice filtering mechanism,
> but
> >> this is more complicated, since some QA plots come from one filter,
> while
> >> others come from a different stage.
> >>
> >> However, under the assumption that no one would ever want to align an
> >> adapter, i.e., those reads will not be carried forward, the adapter
> >> removal
> >> could just be treated specially hard-coded. And then just expect more
> >> customized solutions to leverage the internal ShortRead functions for
> >> generating each slot in the QA object, building it up incrementally, on
> >> different subsets. Of course, to make sense, that would require a
> >> different
> >> report template, too.
> >
> > Hi Michael -- Yes it would be nice to be able to more flexibly control
> how
> > different components of the report are generated, or at least to make
> some
> > smarter choices along the lines you suggest for adapter contaminants.
> It's
> > hard to know how to make this really general, but I have come across
> other
> > situations where I'd like to cherry-pick which parts of the QA process I
> > want to perform. I think I need some standardization on function
> signatures
> > for generating each report section, tighter description of results from
> each
> > section (i.e., a formal class  hierarchy), and then a flexible report
> > composition. It seems like quite a big task; I wonder if there are good
> > models out there to follow? arrayQualityMetrics?
>
>   I think arrayQualityMetrics is a good starting place.  Audrey and
> Wolfgang have
> done a good job of modularizing the components.  But there are still
> hiccups - which
> suggests just how hard that is.  And as you suggested, it was a big job.
>
>  I think the case Michael is bringing up might be useful to deal with,
> without
> a major rewrite.  There should be some sort of file that ShortRead has
> access to
> (or an input parameter) that gives some more details on the samples and on
> the
> processing (eg what the sample labels should be, and what the adapters etc
> are).
> Then this information could be used in the current paradigm.
>
> Mostly the issue is that if you have adapter contamination then the
> subsequent plots
> (eg nucleotide by cycle) are not useful.  You cannot see anything in
> them and then
> you have to go back and strip adapters by hand, then rerun ShortRead.
> I agree that
> you may want more general filtering, as an abundance of any read will
> affect the plots,
> but I think there is agreement that one would never want to include
> the adapters (you do want
> counts as are produced now, but given their affect on the graphics
> filtering would be
> beneficial).
>
>  best wishes
>    Robert
> >
> > Martin
> >
> >>
> >> Michael
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-sig-sequencing mailing list
> >> Bioc-sig-sequencing@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
> >
> > --
> > Computational Biology
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> >
> > Location: M1-B861
> > Telephone: 206 667-2793
> >
> > _______________________________________________
> > Bioc-sig-sequencing mailing list
> > Bioc-sig-sequencing@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
>
>
>
> --
> Robert Gentleman
> rgent...@gmail.com
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] filtering by adapters in QA report

Reply via email to