Hey Louis,

I have copied the documentation from the source code, which I hope answers
your questions.  Please let me know if you need more clarification.


This tool will also not work with alignments that have large gaps or skips,
such as those from RNA-seq data.  This is due to the need to buffer small
genomic windows to ensure integrity of the duplicate marking, while large
skips (ex. skipping introns) in the alignment records would force making
that window very large, thus exhausting memory.

source:
https://github.com/broadinstitute/picard/blob/master/src/java/picard/sam/markduplicates/MarkDuplicatesWithMateCigar.java

Long story short is that SAM files are sorted by the 5' alignment start
position while for duplicate marking we look at the 3'-end sequencing start
position, with the latter significantly affected by soft clipping and skips
in the alignment.

N

On Thu, Oct 9, 2014 at 11:41 AM, Louis Letourneau <
[email protected]> wrote:

> I'm curious as to why MarkDuplicatesWithMateCigar has the "This tool
> cannot be used with alignments that have large gaps or reference skips,
> which happens frequently in RNA-seq data." limitation?
>
> Thanks
> Louis
>
> On 14-10-08 11:25 AM, George Grant wrote:
> > Picard Release 1.122
> > 8 October 2014
> >
> > - New Command Line Program "GenotypeConcordance"
> >     -- Calculates the concordance between genotype data for two samples
> in two different VCFs - one being considered the truth (or reference) the
> other being considered the call.  The concordance is broken into separate
> results sections for SNPs and indels.  Summary and detailed statistics are
> reported.
> >     Note that for any pair of variants to compare, only the alleles for
> the samples under interrogation are considered and MNP, Symbolic, and Mixed
> classes of variants are not included.
> >
> > - New Command Line Program "UpdateVcfDictionary"
> >     -- Updates the sequence dictionary of a VCF from another file (SAM,
> BAM, VCF, dictionary, interval_list, fasta, etc).
> >
> > - New Command Line Program "VcfToIntervalList"
> >     -- Create an interval list from a VCF
> >
> > - New Command Line Program "MarkDuplicatesWithMateCigar"
> >     -- A new tool with which to mark duplicates:
> >     This tool can replace MarkDuplicates if the input SAM/BAM has Mate
> CIGAR (MC) optional tags
> >     pre-computed (see the tools
> RevertOriginalBaseQualitiesAndAddMateCigar and
> >     FixMateInformation).  This allows the new tool to perform a
> streaming duplicate
> >     marking routine (i.e. a single-pass).  This tool cannot be used with
> >     alignments that have large gaps or reference skips, which happens
> >     frequently in RNA-seq data.
> >
> >     There were many refactors of the old MarkDuplicates and
> >     MarkDuplicatesWithMateCigar, since the share common code.
> >     EstimateLibraryComplexity was caught up in this too.
> >
> >     Many, many, many unit tests were added to were added to prove
> >     equivalency of MarkDuplicatesWithMateCigar to MarkDuplicates.  This
> also
> >     exposed a few one in a million corner cases in MarkDuplicates both in
> >     duplicate marking as well as optical duplicate detection.  This
> results
> >     in MarkDuplicates needing to write slightly larger temporary files
> when
> >     running.  SamFileTester was also improved to handle the various test
> >     cases for duplicate marking testing.
> >
> > - Updates to IntervalList:
> >     -- Added capacity to create a simple interval list from a string
> (the name of the contig)
> >     -- Added the capacity to subtract one interval list from another
> (currently
> >        it would only work if they were both wrapped inside a container)
> >
> > - Updates to SamLocusIterator
> >     -- Performance optimizations gaining about 35% speed up...
> >
> > - Updates to MarkDuplicates:
> >     -- Removed unnecessary storage of a string in the Read Ends in Mark
> >     -- Clarifed the size of ReadEndsForMarkDuplicates
> >
> > - Updated the minimum number of times that the BAIT_INTERVALS (in
> CalculateHsMetrics) and TARGET_INTERVALS (in CollectTargetedMetrics) must
> be set to one.
> >
> > - Moved CollectHiSeqPfFailMetrics into picard public
> >
> > - Updates to documentation generation (internal):
> >     -- changed link to IntervalList.java documentation
> >     -- updated how _includes/command-line-usage.html is generated
> >
> > - Moved SAMSequenceDictionaryExtractor and tests from picard to htsjdk
> >
> > - George
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> >
> >
> >
> > _______________________________________________
> > Samtools-help mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/samtools-help
> >
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Samtools-help mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/samtools-help
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to