Hi David,

Unfortunately, the more memory you give to this program, the more it tries to 
hold in RAM.  There should be a way to control this but currently there isn't.  
Two things you might try:

reduce the -Xmx value.  E.g. try -Xmx2g .  I know this is counterintuitive but 
as I mentioned above this will reduce the amount of RAM the program decides to 
use.
Add the command-line argument READ_NAME_REGEX=null .  This may cause some 
inflation of the library size estimate if you have optical duplicates, because 
optical duplicate detection will be disabled, but since your stack traced below 
indicates that you are running out of memory parsing the physical location 
information, disabling this feature might get the program to run.

Let me know how it goes.

-Alec

On Jun 5, 2014, at 9:41 AM, David Langenberger <[email protected]> 
wrote:

> Dear Samtools-help list members,
> 
> I want to run EstimateLibraryComplexity.jar with a 9.8GB big bam file, but I 
> always get a OutOfMemoryError error. I already tried -Xmx (up to 60GB) and 
> still get the error. Has anybody an idea of how to run 
> EstimateLibraryComplexity on bigger bam files?
> 
> 
> That's my call and the error message:
> =============================
> 
> $ java -Xmx10g -jar EstimateLibraryComplexity.jar INPUT=file.bam 
> OUTPUT=file.libraryComplexity
> 
> [Wed Jun 04 21:43:08 CEST 2014] picard.sam.EstimateLibraryComplexity 
> INPUT=[file.bam] OUTPUT=file.libraryComplexity    MIN_IDENTICAL_BASES=5 
> MAX_DIFF_RATE=0.03 MIN_MEAN_QUALITY=20 MAX_GROUP_RATIO=500 
> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* 
> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false 
> VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 
> CREATE_INDEX=false CREATE_MD5_FILE=false
> [Wed Jun 04 21:43:08 CEST 2014] Executing as me@work on Linux 
> 3.6.2-1.fc16.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10; 
> Picard version: 1.114(444810c1de1433d9eca8130be63ccc7fd70a9499_1400593393) 
> JdkDeflater
> INFO    2014-06-04 21:43:08     EstimateLibraryComplexity       Will store 
> 15494157 read pairs in memory before sorting.
> INFO    2014-06-04 21:43:13     EstimateLibraryComplexity       Read     
> 1,000,000 records.  Elapsed time: 00:00:05s.  Time for last 1,000,000:    5s. 
>  Last read position: chr10:38,239,480
> 
> ....
> 
> INFO    2014-06-04 21:53:21     EstimateLibraryComplexity       Read    
> 30,000,000 records.  Elapsed time: 00:10:13s.  Time for last 1,000,000:  
> 183s.  Last read position: chr15:34,522,127
> 
> [Wed Jun 04 22:54:26 CEST 2014] picard.sam.EstimateLibraryComplexity done. 
> Elapsed time: 71.30 minutes.
> Runtime.totalMemory()=5801312256
> To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOfRange(Arrays.java:2694)
>        at java.lang.String.<init>(String.java:203)
>        at java.lang.String.substring(String.java:1913)
>        at htsjdk.samtools.util.StringUtil.split(StringUtil.java:89)
>        at 
> picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:71)
>        at 
> picard.sam.EstimateLibraryComplexity.doWork(EstimateLibraryComplexity.java:256)
>        at 
> picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
>        at 
> picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124)
>        at 
> picard.sam.EstimateLibraryComplexity.main(EstimateLibraryComplexity.java:217)
> 
> 
> 
> And that's the java version:
> =====================
> 
> $ java -showversion
> java version "1.7.0_07"
> Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
> Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
> 
> 
> I also tried ValidateSamFile.jar:
> ========================
> 
> 
> $ java -jar /scr/k41san/tools/picard/picard-tools-1.114/ValidateSamFile.jar 
> INPUT=file.bam MODE=SUMMARY
> 
> [Thu Jun 05 12:12:17 CEST 2014] picard.sam.ValidateSamFile INPUT=file.bam 
> MODE=SUMMARY    MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true 
> IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO 
> QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 
> MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
> [Thu Jun 05 12:12:17 CEST 2014] Executing as me@work on Linux 
> 3.6.2-1.fc16.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10; 
> Picard version: 1.114(444810c1de1433d9eca8130be63ccc7fd70a9499_1400593393) 
> JdkDeflater
> INFO    2014-06-05 12:13:18     SamFileValidator        Validated Read    
> 10,000,000 records.  Elapsed time: 00:01:00s.  Time for last 10,000,000:   
> 60s.  Last read position: chr11:67,275,063
> INFO    2014-06-05 12:14:36     SamFileValidator        Validated Read    
> 20,000,000 records.  Elapsed time: 00:02:18s.  Time for last 10,000,000:   
> 77s.  Last read position: chr12:112,229,147
> INFO    2014-06-05 12:15:45     SamFileValidator        Validated Read    
> 30,000,000 records.  Elapsed time: 00:03:27s.  Time for last 10,000,000:   
> 69s.  Last read position: chr15:34,522,127
> INFO    2014-06-05 12:18:05     SamFileValidator        Validated Read    
> 40,000,000 records.  Elapsed time: 00:05:47s.  Time for last 10,000,000:  
> 140s.  Last read position: chr16:56,362,603
> INFO    2014-06-05 12:20:07     SamFileValidator        Validated Read    
> 50,000,000 records.  Elapsed time: 00:07:49s.  Time for last 10,000,000:  
> 121s.  Last read position: chr17:65,979,420
> INFO    2014-06-05 12:21:11     SamFileValidator        Validated Read    
> 60,000,000 records.  Elapsed time: 00:08:53s.  Time for last 10,000,000:   
> 64s.  Last read position: chr19:38,049,399
> INFO    2014-06-05 12:27:34     SamFileValidator        Validated Read    
> 70,000,000 records.  Elapsed time: 00:15:16s.  Time for last 10,000,000:  
> 383s.  Last read position: chr1:43,396,405
> INFO    2014-06-05 12:48:18     SamFileValidator        Validated Read    
> 80,000,000 records.  Elapsed time: 00:36:00s.  Time for last 10,000,000: 
> 1,243s.  Last read position: chr1:246,706,542
> 
>>> Still running  2014-06-05 15:37
> 
> 
> I also posted the problem at Biostars (https://www.biostars.org/p/102538/) 
> and SEQanswers (http://seqanswers.com/forums/showthread.php?t=43910).
> 
> 
> Thanks for your help,
> David Langenberger
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their 
> applications. Written by three acclaimed leaders in the field, 
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Samtools-help mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/samtools-help

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to