On 08/11/2011 09:50 AM, Kunbin Qu wrote:
Hi, I have some human single end RNA-seq runs on HiSeq. Can I have
some suggestions on how to assess how many duplicated reads out of
these libraries? I looked around srFilter() in ShortRead, but have
not had a clear thought on how to implement it? Should I use IRanges
as an alternative to assess the unique starting site after the
mapping? If so, what function do you suggest? I'd like to count reads
which map to the same location (even with some mismatches) as
duplicates. Thanks.

ShortRead::tables() could be used for exactly identical unaligned reads. ShortRead::occurrenceFilter is an implementation for non-gapped, aligned reads. For aligned reads with gaps I think you're on your own, but maybe GRanges::readGappedAlignments or Rsamtools::scanBam + the logic of ShortRead::occurrenceFilter would be a starting point. Perhaps your aligner has already flagged duplicate reads, in which case the 'flag' field available in scanBamParam and scanBam would be helpful.

Hope that is of some help.

Martin



-Kunbin



______________________________________________________________________


The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmas...@genomichealth.com and delete this message, along with any attachments, from your computer.
[[alternative HTML version deleted]]

_______________________________________________ Bioc-sig-sequencing
mailing list Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to