[Samtools-help] Problem running Picard MarkDuplicates

Asa Perez-Bercoff Mon, 08 Sep 2014 00:27:29 -0700

Hi there,

I’ve been trying to use Picard MarkDuplicates to mark and remove duplicate 
entries in some BAM files of mine. (The BAM files are sorted and indexed.)


I’ve the BAM files from 4 fungal strains, and for one of them MarkDuplicates 
works just fine. Have run this multiple times — always with the same results. 
That is, it always works for one fungal strain (always the same one), but 
always fails for the other 3 fungal strains.

For the 3 fungal strains where it fails I seem to run out of memory. I’ve tried 
fixing the problem by increasing the requested RAM (once I even asked for a 
ridiculous amount of RAM), and specifying a temporary directory for Java IO 
with –DJava.io.tmpdir, as recommended in your mailing list, but alas the 
problem remains.

I’ve always run with the latest version of Picard. Thus, recently updated it to 
your latest version, 1.119, and although the error message has changed slightly 
the problem of running out of memory still remains.

My Java version:
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)


Below is the command I ran, and the error message output.

 java -Xmx22g -Djava.io.tmpdir=$HOME/tempdir -jar 
$HOME/bio/picard-tools-1.119/MarkDuplicates.jar \
        I =/my/path/my_fungal_strain.srt.bam \
        O=/my/path/my_fungal_strain.srt.dedup.bam \
        METRICS_FILE=/my/path/my_fungal_strain.srt.dedup.duplicationMetrics \
        CREATE_INDEX=true \
        REMOVE_DUPLICATES=true \
        ASSUME_SORTED=true

picard.sam.MarkDuplicates INPUT=[/my/path/my_fungal_strain.srt.bam] 
OUTPUT=/my/path/my_fungal_strain.srt.dedup.bam 
METRICS_FILE=/my/path/my_fungal_strain.srt.dedup.duplicationMetrics 
REMOVE_DUPLICATES=true ASSUME_SORTED=true CREATE_INDEX=true    
PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates 
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 
READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* 
OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false 
VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 
CREATE_MD5_FILE=false
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library 
/path/to/my/bin/picard-tools-1.119/libIntelDeflater.so which might have 
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', 
or link it with '-z noexecstack'.
[Fri Sep 05 19:52:55 EST 2014] Executing as [email protected] on Linux 
2.6.32-431.11.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 
1.7.0_51-b13; Picard version: 
1.119(d44cdb51745f5e8075c826430a39d8a61f1dd832_1408991805) IntelDeflater
INFO    2014-09-05 19:52:55     MarkDuplicates  Start of doWork freeMemory: 
754742648; totalMemory: 759693312; maxMemory: 20997734400
INFO    2014-09-05 19:52:55     MarkDuplicates  Reading input file and 
constructing read end information.
INFO    2014-09-05 19:52:55     MarkDuplicates  Will retain up to 83324342 data 
points before spilling to disk.
INFO    2014-09-05 19:53:20     MarkDuplicates  Read     1,000,000 records.  
Elapsed time: 00:00:24s.  Time for last 1,000,000:   24s.  Last read position: 
NODE_7_length_125489_cov_34.5327_ID_35439289:42,886
INFO    2014-09-05 19:53:20     MarkDuplicates  Tracking 135538 as yet 
unmatched pairs. 292 records in RAM.
[Fri Sep 05 19:53:32 EST 2014] picard.sam.MarkDuplicates done. Elapsed time: 
0.63 minutes.
Runtime.totalMemory()=2518155264
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: 
/my/home/tempdir/user/CSPI.2431821386522683006.tmp/5438.tmpnot found
        at 
htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:63)
        at 
htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
        at 
htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
        at 
htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
        at 
htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
        at picard.sam.DiskReadEndsMap.put(DiskReadEndsMap.java:67)
        at 
picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:449)
        at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
        at 
picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
        at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)
Caused by: java.io.FileNotFoundException: 
/my/home/tempdir/user/CSPI.2431821386522683006.tmp/5438.tmp (Too many open 
files)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
        at 
htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:60)
        ... 9 more

One of the problems seem to be that Picard MarkDuplicates can’t write to the 
temporary directory at /my/home/tempdir/user/
I’ve set that directory so that anyone can write to it, but this doesn’t seem 
to solve the problem, so the temporary file 
/my/home/tempdir/user/CSPI.2431821386522683006.tmp/5438.tmp isn’t written out.

Hoping you can help me shed light on this, so that I can make your tool run to 
completion using my data.
(For all 4 strains it works fine running samtools rmdup, but for consistency 
I’d very much like to make Picard MarkDuplicates work for all 4 strains.)

Many thanks in advance,
Åsa

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

[Samtools-help] Problem running Picard MarkDuplicates

Reply via email to