Hi Alec,
Thank you for your helpful suggestions. I will try following your suggestions
and see what will happen.
Cheers,
Yibo
On Jul 17, 2014, at 11:34, Alec Wysoker <[email protected]> wrote:
> Hi Yibo,
>
> First of all, FYI this is a stack trace:
>
> Exception in thread "main" net.sf.samtools.SAMException: Exception when
> processing alignment for BAM index HS2:410:C21E6ACXX:4:2316:13634:18930 1/2
> 97b aligned read.
> at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
> at
> net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
> at
> net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
> at
> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
> at
> net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
> at
> net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
> Caused by: net.sf.samtools.SAMException: Exception creating BAM index for
> record HS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b aligned read.
> at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
> at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
> ... 5 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 32770
> at
> net.sf.samtools.BAMIndexer$BAMIndexBuilder.processAlignment(BAMIndexer.java:276)
> at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:92)
> ... 6 more
>
> Regarding your original question, the problem is that apparently your have a
> contig that is longer than 512MB. The BAM index format cannot handle a
> contig this size. If you don't need a BAM index for your process, then you
> can pass CREATE_INDEX=False to all the Picard programs that write a BAM file,
> and this problem should go away. If you need a BAM index, because you're
> going to query by genomic locus, then you've got a problem.
>
> Regarding using MergeBamAlignment, I believe it should be able to handle the
> multiple alignments produced by Tophat. I've never tried it with STAR, but I
> would suggest you try it without removing multiple mappings, then run
> ValidateSamFile on the BAM produced, and if that succeeds then you can assume
> it handles the multiple mappings produced by the aligner correctly.
>
> -Alec
>
> On 7/17/14, 2:23 PM, Yibo wrote:
>> Hi Alec,
>>
>> Thank you very much for your reply and suggestion. I am not sure what the
>> stack trace is, and I only had the running log reporting the error message
>> (see attached). Yes, for Tophat2 output, there is an unmapped BAM file, but
>> for STAR, there is no unmapped BAM file. Before I try to merge them, I have
>> a question: I am using unique mapping results from the mapping output. So,
>> should I remove multiple mapping results and then merge with unmapped file,
>> or not remove and merge? Thanks again.
>>
>> Yibo
>>
>>
>>
>>
>>
>>
>>
>> On Jul 17, 2014, at 6:11, Alec Wysoker <[email protected]> wrote:
>>
>>> Hi Yibo,
>>>
>>> It would help if you send the stack trace that should be printed along with
>>> the error. However, the fact that you need to reduce the validation
>>> stringency on the various PIcard tools means that you are likely to run
>>> into various problems. I suggest you change your workflow as follows:
>>> rather than using the output of your aligner directly, use Picard
>>> MergeBamAlignment to combine the aligner output with a valid unmapped BAM.
>>> This should eliminate all the Picard validation issues. If you don't have
>>> an unmapped BAM, you can create one with Picard FastqToSam.
>>>
>>> -Alec
>>>
>>> On 7/16/14, 3:18 PM, Yibo wrote:
>>>> Dear all,
>>>>
>>>> I am using Picard tools but encountered some problems. I want to use GATK
>>>> to call snp for RNA-seq data. I have used two methods, Tophat2 and STAR,
>>>> to perform RNA-seq mapping. Because I do not have my own reference genome,
>>>> I used the genome of a related species as the reference (different genus).
>>>> However, for the Tophat2 and STAR mapping output (I used only unique
>>>> mapping output), when using Picardtools (SortSam, AddOrReplaceReadGroups
>>>> or MarkDuplicates), some errors occurred.
>>>>
>>>> I am using Java 1.7.0_45 and Picard-tools-1.117 (or 1.77). The error
>>>> messages are as follows:
>>>>
>>>> (1) For STAR output, I successfully used AddOrReplaceReadGroups to add
>>>> read groups and sort by coordinate. However, when using MarkDuplicates, an
>>>> error occurred as follows (running log attached):
>>>>
>>>> "Exception in thread "main" htsjdk.samtools.SAMException: Exception when
>>>> processing alignment for BAM index HS2:410:C21E6ACXX:4:1109:16160:63599
>>>> 2/2 99b aligned read."
>>>>
>>>> I checked this read and it is a read pair and seems ok. I attached the
>>>> paired-end reads information as follows:
>>>>
>>>> HS2:410:C21E6ACXX:4:1109:16160:63599 163 1 536940431
>>>> 255 9S90M = 536940464 99
>>>> CAGAAAATGAACTGTTTGGAGAAAGATGCCTGCAGAAACCTACATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGG
>>>>
>>>> BBBFFFFFFFFFFIFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFF<BBFFFFFFFFFBFFFFFFFFF
>>>> RG:Z:LANE11A NH:i:1 HI:i:1 jI:B:i,-1 jM:B:c,-1
>>>> nM:i:10 AS:i:134
>>>> HS2:410:C21E6ACXX:4:1109:16160:63599 83 1 536940464
>>>> 255 66M33S = 536940431 -99
>>>> CATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGGGGCTGAACACAATTATTTTGAATGTAAACTCTTATGCCAAAG
>>>>
>>>> FFFFFFFBFFFFFFFFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFIIIIIIIIIIIIIIIIIIFIFFFFFFFFFFBBB
>>>> RG:Z:LANE11A NH:i:1 HI:i:1 jI:B:i,-1 jM:B:c,-1
>>>> nM:i:10 AS:i:134
>>>>
>>>> Additionally, I used ValidateSamFile tool to check my sam file, and found
>>>> as follows:
>>>>
>>>> Error Type Count
>>>> ERROR:MATE_NOT_FOUND 37
>>>> WARNING:MISSING_TAG_NM 48684343
>>>>
>>>> Because only 37 reads is mate_not_found, I deleted these reads from my sam
>>>> file. Then rerunning MarkDuplicates, the same error still occurred.
>>>>
>>>> My scripts are as follows:
>>>>
>>>> java -Xmx4G -jar
>>>> /u/home/y/ybhu/picard-tools-1.117/AddOrReplaceReadGroups.jar
>>>> INPUT=${path_input}/${samp}_uniquelymapped.sam
>>>> OUTPUT=${path_input}/${samp}_rg.sam SORT_ORDER=coordinate RGID=LANE11A
>>>> RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A
>>>>
>>>> java -Xmx4G -jar /u/home/y/ybhu/picard-tools-1.117/MarkDuplicates.jar
>>>> I=${path_input}/${samp}_rg.bam O=${path_input}/${samp}_dedup.bam
>>>> METRICS_FILE=${path_input}/${samp}_dedupmetrics.txt REMOVE_DUPLICATES=TRUE
>>>> CREATE_INDEX=TRUE ASSUME_SORTED=TRUE VALIDATION_STRINGENCY=SILENT
>>>>
>>>>
>>>>
>>>> (2) For Tophat2 output, I used different steps (first use SortSam to sort
>>>> by coordinate, then AddOrReplaceReadGroups to add read groups). However,
>>>> for these two steps, error messages also occurred (running log as
>>>> attached).
>>>>
>>>> For SortSam tool, 11449 reads were reported "Insert size out of range".
>>>> Then I changed the argument "VALIDATION_STRINGENCY=STRICT" to "LENIENT",
>>>> the SortSam can ignore these errors and finish sorting.
>>>>
>>>> But when running AddOrReplaceReadGroups, a same error occurred (similar to
>>>> STAR running error):
>>>>
>>>> "Exception in thread "main" net.sf.samtools.SAMException: Exception when
>>>> processing alignment for BAM index HS2:410:C21E6ACXX:4:2316:13634:18930
>>>> 1/2 97b aligned read."
>>>>
>>>> I pasted the read information here:
>>>>
>>>> HS2:410:C21E6ACXX:4:2316:13634:18930 73 1 536905899
>>>> 50 32M2D57M4I4M * 0 0
>>>> GTAAATTTTTGTAAATGTTCAATGAGGTGCTGAAACATGTATGTTCTTTAGCTTTCCCATTTAGAAGACAACATAAATCTTAATTTTTCCACTAATT
>>>>
>>>> B<BFFFFFFFFFFFF0<BFFIIIF0BFBFFFIIIIIBFIBFFFBFFFIFFBFFFFFFFFFFIIBFFIIIFIIIIIFIFFFFFFFFBFFFBBBFFFBF
>>>> AS:i:-126 XN:i:0 XM:i:20 XO:i:2 XG:i:6 NM:i:26
>>>> MD:Z:2C1G10A3T0G2A0T0A3A0A1^AA3T9G4A1A0G2T24T1A0A5G2 YT:Z:UU NH:i:1.
>>>>
>>>> My scripts are as follows:
>>>>
>>>> java -Xmx3G -jar /u/local/apps/picard-tools/1.77/SortSam.jar
>>>> INPUT=${path_input}/${samp}_uniquelymapped.sam
>>>> OUTPUT=${path_input}/${samp}_unique_sorted.sam SORT_ORDER=coordinate
>>>> CREATE_INDEX=TRUE VALIDATION_STRINGENCY=LENIENT
>>>>
>>>> java -Xmx3G -jar
>>>> /u/local/apps/picard-tools/1.77/AddOrReplaceReadGroups.jar
>>>> INPUT=${path_input}/${samp}_unique_sorted.bam
>>>> OUTPUT=${path_input}/${samp}_rg.bam SORT_ORDER=coordinate
>>>> CREATE_INDEX=TRUE RGID=LANE11A RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A
>>>> VALIDATION_STRINGENCY=LENIENT
>>>>
>>>> In summary, the errors for STAR and Tophat2 seem to be caused by creating
>>>> BAM index for these reported reads. But I do no know why and how to
>>>> resolve it.
>>>>
>>>> Your comments are very appreciated, and thanks in advance.
>>>>
>>>> Yibo
>>>>
>>>> Department of Ecology and Evolutionary Biology,
>>>> University of California, Los Angeles
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Want fast and easy access to all the code in your enterprise? Index and
>>>> search up to 200,000 lines of code with a free copy of Black Duck
>>>> Code Sight - the same software that powers the world's largest code
>>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>>> http://p.sf.net/sfu/bds
>>>>
>>>>
>>>> _______________________________________________
>>>> Samtools-help mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/samtools-help
>>>
>>
>
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help