Re: [Samtools-help] Problem when running Picard tools

Alec Wysoker Mon, 28 Jul 2014 06:59:57 -0700

Hi Yibo,

If you have a contig longer than 512MB you will have problems.

Yes, you could split your contig into piece smaller than 512MB, with theunderstanding that reads that straddle the breakpoint will not alignproperly. I think this is just a matter of editing your reference fastaand splitting that sequence into pieces and given each piece a differentname. You'll need to realign your reads with the new reference fasta.


-Alec

On 7/23/14, 4:08 PM, 胡义波 wrote:

Hi Alec,
Following your suggestions, I have tried the two methods but they didnot work with some errors.
One is to set CREATE_INDEX=FALSE when using MarkDuplicates. As aresult, the MarkDuplicates running is ok. Then following GATK snpcalling pipeline for RNA-Seq data, I run SplitNCigarReads but the sameerror occurred again (ERROR MESSAGE: Exception when processingalignment for BAM index HS2:410:C21E6ACXX:4:1109:16160:63599 2/2 99baligned read.). Because I did not create index in the MarkDuplicatesstep, I created index using "samtools index” instead of using PicardTools and the index creation running seems ok. Then I re-runSplitNCigarReads but the same error still occurred (see attached).
The other is to merge mapped bam file with unmapped bam file forTophat2 output. Firstly, I fixed up the unmapped bam file using thescript “fix_tophat_unmapped_reads.py”. Then, I used MergeBamAlignmentto merge but an error occurred (Second read from pair not found inunmapped bam: HS2:410:C21E6ACXX:4:1101:1421:96739,HS2:410:C21E6ACXX:4:1101:1422:64244). (see attached)
But indeed, I checked these two reads and found they were in theunmapped bam file. The following is an example.
The read is found in the unmapped bam file:
HS2:410:C21E6ACXX:4:1101:1421:967391337530976880*=10CCTAGGATATCTTTTATGGTATAGTATGGGTGGAATGGAATTTTATCTGNGTTGGBBBFFFFFFFFFFIIFIIIFIIIIFFIIIIBFFBFFIIIIIIIIFIIII#07FFI

The read is found in the mapped bam file:
HS2:410:C21E6ACXX:4:1101:1421:96739897530976885099M*00TAAAGTGAAAAGCAAAAAATCGAGTTAGTGTAGCTTTATCAACGGAGAATCCTCCTCAAATTCATTCTACTAGTGTACTTCCGATATATGGGATGGCAGBFFFFBBFFFFFFFFFFFFBFFFFFFFFFFIIIIIFIFBIIIIIIIFIFFIFFIFFIIIFBFIIIFIFFFFIIIIFFIIFIIIIIIFFFFFFFFFFBBBAS:i:-81XN:i:0XM:i:15XO:i:0XG:i:0NM:i:15MD:Z:10T2G2G5G8G8T2T28A0A2G2A2A8A2A2T1YT:Z:UUNH:i:1
Because just 2 reads were reported to have an error, I removed thesetwo reads from mapped and unmapped bam files. Then I re-runMergeBamAlignment, the same error still occurred with two new reads.
I do not know how to resolve this problem.
Indeed, I have another question. As you commented previously, "theproblem is that apparently your have a contig that is longer than512MB. The BAM index format cannot handle a contig this size". I amthinking that, even if I merged mapped and unmapped files together andthen rerun all of the steps, I guess that this error for creatingindex still would occur because I am still using the same referencegenome with some contigs longer than 512MB. My question is: is thereany methods to divide long contigs into less than 512MB? I do not knowwhether this idea is feasible.
Thanks again for your help. The attached are running log forSplitNCigarReads and MergeBamAlignment.
Yibo
On Jul 17, 2014, at 11:34, Alec Wysoker <[email protected]<mailto:[email protected]>> wrote:
Hi Yibo,

First of all, FYI this is a stack trace:
Exception in thread "main" net.sf.samtools.SAMException: Exceptionwhen processing alignment for BAM indexHS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b aligned read.atnet.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)atnet.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)atnet.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)atnet.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)atnet.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)atnet.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)Caused by: net.sf.samtools.SAMException: Exception creating BAM indexfor record HS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b aligned read.
    at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
atnet.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
    ... 5 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32770
atnet.sf.samtools.BAMIndexer$BAMIndexBuilder.processAlignment(BAMIndexer.java:276)
    at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:92)
    ... 6 more
Regarding your original question, the problem is that apparently yourhave a contig that is longer than 512MB. The BAM index format cannothandle a contig this size. If you don't need a BAM index for yourprocess, then you can pass CREATE_INDEX=False to all the Picardprograms that write a BAM file, and this problem should go away. Ifyou need a BAM index, because you're going to query by genomic locus,then you've got a problem.
Regarding using MergeBamAlignment, I believe it should be able tohandle the multiple alignments produced by Tophat. I've never triedit with STAR, but I would suggest you try it without removingmultiple mappings, then run ValidateSamFile on the BAM produced, andif that succeeds then you can assume it handles the multiple mappingsproduced by the aligner correctly.
-Alec

On 7/17/14, 2:23 PM, 胡义波 wrote:
Hi Alec,
Thank you very much for your reply and suggestion. I am not surewhat the stack trace is, and I only had the running log reportingthe error message (see attached). Yes, for Tophat2 output, there isan unmapped BAM file, but for STAR, there is no unmapped BAM file.Before I try to merge them, I have a question: I am using uniquemapping results from the mapping output. So, should I removemultiple mapping results and then merge with unmapped file, or notremove and merge? Thanks again.
Yibo
On Jul 17, 2014, at 6:11, Alec Wysoker <[email protected]<mailto:[email protected]>> wrote:
Hi Yibo,
It would help if you send the stack trace that should be printedalong with the error. However, the fact that you need to reducethe validation stringency on the various PIcard tools means thatyou are likely to run into various problems. I suggest you changeyour workflow as follows: rather than using the output of youraligner directly, use Picard MergeBamAlignment to combine thealigner output with a valid unmapped BAM. This should eliminateall the Picard validation issues. If you don't have an unmappedBAM, you can create one with Picard FastqToSam.
-Alec

On 7/16/14, 3:18 PM, Yibo wrote:
Dear all,

I am using Picard tools but encountered some problems. I want to use GATK to 
call snp for RNA-seq data. I have used two methods, Tophat2 and STAR, to 
perform RNA-seq mapping. Because I do not have my own reference genome, I used 
the genome of a related species as the reference (different genus). However, 
for the Tophat2 and STAR mapping output (I used only unique mapping output), 
when using Picardtools (SortSam, AddOrReplaceReadGroups or MarkDuplicates), 
some errors occurred.

I am using Java 1.7.0_45 and Picard-tools-1.117 (or 1.77). The error messages 
are as follows:

(1) For STAR output, I successfully used AddOrReplaceReadGroups to add read 
groups and sort by coordinate. However, when using MarkDuplicates, an error 
occurred as follows (running log attached):

"Exception in thread "main" htsjdk.samtools.SAMException: Exception when processing 
alignment for BAM index HS2:410:C21E6ACXX:4:1109:16160:63599 2/2 99b aligned read."

I checked this read and it is a read pair and seems ok. I attached the 
paired-end reads information as follows:

HS2:410:C21E6ACXX:4:1109:16160:63599    163     1       536940431       255     
9S90M   =       536940464       99      
CAGAAAATGAACTGTTTGGAGAAAGATGCCTGCAGAAACCTACATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGG
     
BBBFFFFFFFFFFIFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFF<BBFFFFFFFFFBFFFFFFFFF
  RG:Z:LANE11A    NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1       nM:i:10 AS:i:134
HS2:410:C21E6ACXX:4:1109:16160:63599    83      1       536940464       255     
66M33S  =       536940431       -99     
CATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGGGGCTGAACACAATTATTTTGAATGTAAACTCTTATGCCAAAG
     
FFFFFFFBFFFFFFFFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFIIIIIIIIIIIIIIIIIIFIFFFFFFFFFFBBB
     RG:Z:LANE11A    NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1       nM:i:10 
AS:i:134

Additionally, I used ValidateSamFile tool to check my sam file, and found as 
follows:

Error Type      Count
ERROR:MATE_NOT_FOUND    37
WARNING:MISSING_TAG_NM  48684343

Because only 37 reads is mate_not_found, I deleted these reads from my sam 
file. Then rerunning MarkDuplicates, the same error still occurred.

My scripts are as follows:

java -Xmx4G -jar /u/home/y/ybhu/picard-tools-1.117/AddOrReplaceReadGroups.jar 
INPUT=${path_input}/${samp}_uniquelymapped.sam 
OUTPUT=${path_input}/${samp}_rg.sam SORT_ORDER=coordinate RGID=LANE11A 
RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A

java -Xmx4G -jar /u/home/y/ybhu/picard-tools-1.117/MarkDuplicates.jar 
I=${path_input}/${samp}_rg.bam O=${path_input}/${samp}_dedup.bam 
METRICS_FILE=${path_input}/${samp}_dedupmetrics.txt REMOVE_DUPLICATES=TRUE 
CREATE_INDEX=TRUE ASSUME_SORTED=TRUE VALIDATION_STRINGENCY=SILENT



(2) For Tophat2 output, I used different steps (first use SortSam to sort by 
coordinate, then AddOrReplaceReadGroups to add read groups). However, for these 
two steps, error messages also occurred (running log as attached).

For SortSam tool, 11449 reads were reported "Insert size out of range". Then I changed the argument 
"VALIDATION_STRINGENCY=STRICT" to "LENIENT", the SortSam can ignore these errors and 
finish sorting.

But when running AddOrReplaceReadGroups, a same error occurred (similar to STAR 
running error):

"Exception in thread "main" net.sf.samtools.SAMException: Exception when processing 
alignment for BAM index HS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b aligned read."

I pasted the read information here:

HS2:410:C21E6ACXX:4:2316:13634:18930    73      1       536905899       50      
32M2D57M4I4M    *       0       0       
GTAAATTTTTGTAAATGTTCAATGAGGTGCTGAAACATGTATGTTCTTTAGCTTTCCCATTTAGAAGACAACATAAATCTTAATTTTTCCACTAATT
       
B<BFFFFFFFFFFFF0<BFFIIIF0BFBFFFIIIIIBFIBFFFBFFFIFFBFFFFFFFFFFIIBFFIIIFIIIIIFIFFFFFFFFBFFFBBBFFFBF
 AS:i:-126       XN:i:0  XM:i:20 XO:i:2  XG:i:6  NM:i:26 
MD:Z:2C1G10A3T0G2A0T0A3A0A1^AA3T9G4A1A0G2T24T1A0A5G2    YT:Z:UU NH:i:1.

My scripts are as follows:

java -Xmx3G -jar /u/local/apps/picard-tools/1.77/SortSam.jar 
INPUT=${path_input}/${samp}_uniquelymapped.sam 
OUTPUT=${path_input}/${samp}_unique_sorted.sam SORT_ORDER=coordinate 
CREATE_INDEX=TRUE VALIDATION_STRINGENCY=LENIENT

java -Xmx3G -jar /u/local/apps/picard-tools/1.77/AddOrReplaceReadGroups.jar 
INPUT=${path_input}/${samp}_unique_sorted.bam 
OUTPUT=${path_input}/${samp}_rg.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE 
RGID=LANE11A RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A 
VALIDATION_STRINGENCY=LENIENT

In summary, the errors for STAR and Tophat2 seem to be caused by creating BAM 
index for these reported reads. But I do no know why and how to resolve it.

Your comments are very appreciated, and thanks in advance.

Yibo

Department of Ecology and Evolutionary Biology,
University of California, Los Angeles



------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds


_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk

_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Re: [Samtools-help] Problem when running Picard tools

Reply via email to