Hi Robert, just to add to the discussion, it was not initially obvious to me that the ShortRead QA report can read either from disk or from a ShortReadQ object within R. This at least provides the flexibility to filter a ShortReadQ object using trimLRPatterns/vmatchPattern/narrow etc and then run a QA filtered report to get more meaningful plots.
I agree it would be a nice feature to be able to specify some adapter sequences to filter in a qa() call itself, or potentially select parts of the report of interest. There will be cases that will test this proposed functionality, especially around partial adapter sequence and the number of mismatches to allow for. I recently came across a synthetic construct (~20 bases) in an illumina experiment which was the first half of an adapter with the addition of a single random DNA base at the 5' start, so the partial adapter effectively started at cycle position 2 of the subjects. Using Biostrings trimLRPatterns may not identify this pattern and dynamically trim or filter (utilizing ranges coordinates) unless the random base is added to the start of the pattern and at least one mismatch is allowed, whereas using a vmatchPattern approach to filter would work. Marcus On Sat, Mar 26, 2011 at 5:41 AM, Robert Gentleman <rgent...@gmail.com>wrote: > On Fri, Mar 25, 2011 at 8:59 AM, Martin Morgan <mtmor...@fhcrc.org> wrote: > > On 03/24/2011 10:56 AM, Michael Lawrence wrote: > >> > >> Hi Martin, > >> > >> It would be nice if the ShortRead QA report could somehow filter out the > >> adapter contamination before generating the rest of its plots, since > those > >> plots are pretty meaningless if there are adapters present. > >> > >> Not sure how to handle this filtering in general. That is, what if > someone > >> then wants to see plots with only the "high quality" reads after the > >> quality > >> plots. It gets complicated. ShortRead has a nice filtering mechanism, > but > >> this is more complicated, since some QA plots come from one filter, > while > >> others come from a different stage. > >> > >> However, under the assumption that no one would ever want to align an > >> adapter, i.e., those reads will not be carried forward, the adapter > >> removal > >> could just be treated specially hard-coded. And then just expect more > >> customized solutions to leverage the internal ShortRead functions for > >> generating each slot in the QA object, building it up incrementally, on > >> different subsets. Of course, to make sense, that would require a > >> different > >> report template, too. > > > > Hi Michael -- Yes it would be nice to be able to more flexibly control > how > > different components of the report are generated, or at least to make > some > > smarter choices along the lines you suggest for adapter contaminants. > It's > > hard to know how to make this really general, but I have come across > other > > situations where I'd like to cherry-pick which parts of the QA process I > > want to perform. I think I need some standardization on function > signatures > > for generating each report section, tighter description of results from > each > > section (i.e., a formal class hierarchy), and then a flexible report > > composition. It seems like quite a big task; I wonder if there are good > > models out there to follow? arrayQualityMetrics? > > I think arrayQualityMetrics is a good starting place. Audrey and > Wolfgang have > done a good job of modularizing the components. But there are still > hiccups - which > suggests just how hard that is. And as you suggested, it was a big job. > > I think the case Michael is bringing up might be useful to deal with, > without > a major rewrite. There should be some sort of file that ShortRead has > access to > (or an input parameter) that gives some more details on the samples and on > the > processing (eg what the sample labels should be, and what the adapters etc > are). > Then this information could be used in the current paradigm. > > Mostly the issue is that if you have adapter contamination then the > subsequent plots > (eg nucleotide by cycle) are not useful. You cannot see anything in > them and then > you have to go back and strip adapters by hand, then rerun ShortRead. > I agree that > you may want more general filtering, as an abundance of any read will > affect the plots, > but I think there is agreement that one would never want to include > the adapters (you do want > counts as are produced now, but given their affect on the graphics > filtering would be > beneficial). > > best wishes > Robert > > > > Martin > > > >> > >> Michael > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioc-sig-sequencing mailing list > >> Bioc-sig-sequencing@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > > > -- > > Computational Biology > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > > > Location: M1-B861 > > Telephone: 206 667-2793 > > > > _______________________________________________ > > Bioc-sig-sequencing mailing list > > Bioc-sig-sequencing@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > > > -- > Robert Gentleman > rgent...@gmail.com > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing