Hi Kat, Yes, this is expected behavior. MarkDuplicates' approach to single-end reads is not as good as it could be. Probably something that looks at unclipped 5' end, and sequence content (perhap CIGAR is a rough proxy for sequence content) would be better.
-Alec On 8/7/14, 12:41 AM, Katherine Pillman wrote: > I have a question about using picard MarkDuplicates with single-end reads. > > I couldn't find information on how Picard defines duplicate reads in the > manual, but in various other places I have read that duplicate reads (for > single end reads at least) are any two reads with the same start position and > cigar string (although for paired, I gathered it is by the positions of the > 5' ends of each read pair). > > However, in my results (single end sequencing), this does not appear to be > correct: all reads with the same starting position are collapsed regardless > of mapped end position, read length or cigar string, leaving only one read. > In many cases, this seems like a poor choice - collapsing reads that are very > unlikely to be PCR duplicates: spliced reads and unspliced ones, 100 bp reads > with 20 bp ones. > > I was wondering whether this is expected behaviour for single end reads - to > identify duplicates based solely on 5' mapping location? Also, if that is > expected behaviour, if I want to collapse based on 5' location and CIGAR > string (or something similar), does anyone know of an existing tool that does > this or should I just write one myself? > > Thanks! > > Kat > > > ------------------------------------------------------------------------------ > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Samtools-help mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/samtools-help ------------------------------------------------------------------------------ Infragistics Professional Build stunning WinForms apps today! Reboot your WinForms applications with our WinForms controls. Build a bridge from your legacy apps to the future. http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
