Hi Kat,

Yes, this is expected behavior. MarkDuplicates' approach to single-end 
reads is not as good as it could be. Probably something that looks at 
unclipped 5' end, and sequence content (perhap CIGAR is a rough proxy 
for sequence content) would be better.

-Alec

On 8/7/14, 12:41 AM, Katherine Pillman wrote:
> I have a question about using picard MarkDuplicates with single-end reads.
>
> I couldn't find information on how Picard defines duplicate reads in the 
> manual, but in various other places I have read that duplicate reads (for 
> single end reads at least) are any two reads with the same start position and 
> cigar string (although for paired, I gathered it is by the positions of the 
> 5' ends of each read pair).
>
> However, in my results (single end sequencing), this does not appear to be 
> correct: all reads with the same starting position are collapsed regardless 
> of mapped end position, read length or cigar string, leaving only one read.  
> In many cases, this seems like a poor choice - collapsing reads that are very 
> unlikely to be PCR duplicates: spliced reads and unspliced ones, 100 bp reads 
> with 20 bp ones.
>
> I was wondering whether this is expected behaviour for single end reads - to 
> identify duplicates based solely on 5' mapping location?  Also, if that is 
> expected behaviour, if I want to collapse based on 5' location and CIGAR 
> string (or something similar), does anyone know of an existing tool that does 
> this or should I just write one myself?
>
> Thanks!
>
> Kat
>
>
> ------------------------------------------------------------------------------
> Infragistics Professional
> Build stunning WinForms apps today!
> Reboot your WinForms applications with our WinForms controls.
> Build a bridge from your legacy apps to the future.
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
> _______________________________________________
> Samtools-help mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/samtools-help


------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to