Dear all,

I am using Picard tools but encountered some problems. I want to use GATK to 
call snp for RNA-seq data. I have used two methods, Tophat2 and STAR, to 
perform RNA-seq mapping. Because I do not have my own reference genome, I used 
the genome of a related species as the reference (different genus). However, 
for the Tophat2 and STAR mapping output (I used only unique mapping output), 
when using Picardtools (SortSam, AddOrReplaceReadGroups or MarkDuplicates), 
some errors occurred.

I am using Java 1.7.0_45 and Picard-tools-1.117 (or 1.77). The error messages 
are as follows:

(1) For STAR output, I successfully used AddOrReplaceReadGroups to add read 
groups and sort by coordinate. However, when using MarkDuplicates, an error 
occurred as follows (running log attached):

"Exception in thread "main" htsjdk.samtools.SAMException: Exception when 
processing alignment for BAM index HS2:410:C21E6ACXX:4:1109:16160:63599 2/2 99b 
aligned read."

I checked this read and it is a read pair and seems ok. I attached the 
paired-end reads information as follows:

HS2:410:C21E6ACXX:4:1109:16160:63599    163     1       536940431       255     
9S90M   =       536940464       99      
CAGAAAATGAACTGTTTGGAGAAAGATGCCTGCAGAAACCTACATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGG
     
BBBFFFFFFFFFFIFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFF<BBFFFFFFFFFBFFFFFFFFF
     RG:Z:LANE11A    NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1       nM:i:10 
AS:i:134
HS2:410:C21E6ACXX:4:1109:16160:63599    83      1       536940464       255     
66M33S  =       536940431       -99     
CATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGGGGCTGAACACAATTATTTTGAATGTAAACTCTTATGCCAAAG
     
FFFFFFFBFFFFFFFFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFIIIIIIIIIIIIIIIIIIFIFFFFFFFFFFBBB
     RG:Z:LANE11A    NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1       nM:i:10 
AS:i:134

Additionally, I used ValidateSamFile tool to check my sam file, and found as 
follows:

Error Type      Count
ERROR:MATE_NOT_FOUND    37
WARNING:MISSING_TAG_NM  48684343

Because only 37 reads is mate_not_found, I deleted these reads from my sam 
file. Then rerunning MarkDuplicates, the same error still occurred.

My scripts are as follows:

java -Xmx4G -jar /u/home/y/ybhu/picard-tools-1.117/AddOrReplaceReadGroups.jar 
INPUT=${path_input}/${samp}_uniquelymapped.sam 
OUTPUT=${path_input}/${samp}_rg.sam SORT_ORDER=coordinate RGID=LANE11A 
RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A

java -Xmx4G -jar /u/home/y/ybhu/picard-tools-1.117/MarkDuplicates.jar 
I=${path_input}/${samp}_rg.bam O=${path_input}/${samp}_dedup.bam 
METRICS_FILE=${path_input}/${samp}_dedupmetrics.txt REMOVE_DUPLICATES=TRUE 
CREATE_INDEX=TRUE ASSUME_SORTED=TRUE VALIDATION_STRINGENCY=SILENT



(2) For Tophat2 output, I used different steps (first use SortSam to sort by 
coordinate, then AddOrReplaceReadGroups to add read groups). However, for these 
two steps, error messages also occurred (running log as attached).

For SortSam tool, 11449 reads were reported "Insert size out of range". Then I 
changed the argument "VALIDATION_STRINGENCY=STRICT" to "LENIENT", the SortSam 
can ignore these errors and finish sorting.

But when running AddOrReplaceReadGroups, a same error occurred (similar to STAR 
running error):

"Exception in thread "main" net.sf.samtools.SAMException: Exception when 
processing alignment for BAM index HS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b 
aligned read."

I pasted the read information here:

HS2:410:C21E6ACXX:4:2316:13634:18930    73      1       536905899       50      
32M2D57M4I4M    *       0       0       
GTAAATTTTTGTAAATGTTCAATGAGGTGCTGAAACATGTATGTTCTTTAGCTTTCCCATTTAGAAGACAACATAAATCTTAATTTTTCCACTAATT
       
B<BFFFFFFFFFFFF0<BFFIIIF0BFBFFFIIIIIBFIBFFFBFFFIFFBFFFFFFFFFFIIBFFIIIFIIIIIFIFFFFFFFFBFFFBBBFFFBF
       AS:i:-126       XN:i:0  XM:i:20 XO:i:2  XG:i:6  NM:i:26 
MD:Z:2C1G10A3T0G2A0T0A3A0A1^AA3T9G4A1A0G2T24T1A0A5G2    YT:Z:UU NH:i:1.

My scripts are as follows:

java -Xmx3G -jar /u/local/apps/picard-tools/1.77/SortSam.jar 
INPUT=${path_input}/${samp}_uniquelymapped.sam 
OUTPUT=${path_input}/${samp}_unique_sorted.sam SORT_ORDER=coordinate 
CREATE_INDEX=TRUE VALIDATION_STRINGENCY=LENIENT

java -Xmx3G -jar /u/local/apps/picard-tools/1.77/AddOrReplaceReadGroups.jar 
INPUT=${path_input}/${samp}_unique_sorted.bam 
OUTPUT=${path_input}/${samp}_rg.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE 
RGID=LANE11A RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A 
VALIDATION_STRINGENCY=LENIENT

In summary, the errors for STAR and Tophat2 seem to be caused by creating BAM 
index for these reported reads. But I do no know why and how to resolve it.

Your comments are very appreciated, and thanks in advance.

Yibo

Department of Ecology and Evolutionary Biology,
University of California, Los Angeles

Attachment: running_log_for_STAR
Description: Binary data

Attachment: running_log_for_Tophat2
Description: Binary data

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to