Is there anyone from the samtools group that can comment on this? We wound up 
using FixBAMFile, but sounds like there’s a bug in the bin calculation. It 
shouldn’t change by adding @RG tags to the reads (according to the error 
itself).

On Aug 20, 2014, at 3:34 PM, Mark Ebbert <[email protected]> wrote:

> Hi,
> 
> I posted a similar question on Biostars and realized I should have come here 
> to begin with. We received ~1000 whole-genome bams that didn't have the @RG 
> tag in the reads (existed in the header though). We used 'bamaddrg' to add 
> @RG tags to the reads and are now getting the following error when we use 
> Picard's MarkDuplicates:
> 
> Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation 
> error: ERROR: Record 1642900, Read name HS2000-1005_167:8:1103:3541:88508, 
> bin field of BAM record does not equal value computed based on alignment 
> start and end, and length of sequence to which read is aligned
>         at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452)
>         at 
> htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:643)
>         at 
> htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
>         at 
> htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
>         at 
> htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514)
>         at 
> htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
>         at 
> picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:413)
>         at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
>         at 
> picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
>         at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)
> 
> Using 'VALIDATION_STRINGENCY=LENIENT' MarkDuplicates will ignore the error, 
> but it prints it for a large number of reads (I didn't count how many). 
> Another thread on the mailing list 
> (http://sourceforge.net/p/samtools/mailman/message/31853465/) says that is 
> "bad" and we can use the following command to fix it: 'java -classpath 
> sam-1.99.jar net.sf.samtools.FixBAMFile test.bam fixed.bam'
> 
> We have so many large bams though. 
> 
> Questions:
> 1. The error states that the bin is calculated based on alignment start and 
> end. These values did not change! So why would the calculated bin change?
> 2. Is there a more manageable way to avoid the incorrect bins while adding 
> @RG tags?
> 
> Here are two read pairs to compare:
> 
> ### BEGIN ###
> HS2000-1005_167:8:1103:3541:88508     73      chr1    5881857 254     100M    
> *       0       0       
> CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
>     
> @?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C
>     BC:Z:0  XD:Z:100        SM:i:500        AS:i:0
> HS2000-1005_167:8:1103:3541:88508     133     chr1    5881857 0       *       
> =       5881857 0       
> GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
>     
> ####################################################################################################
>     BC:Z:0
> 
> HS2000-1005_167:8:1103:3541:88508     73      chr1    5881857 254     100M    
> *       0       0       
> CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
>     
> @?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C
>     BC:Z:0  XD:Z:100        SM:i:500        AS:i:0  RG:Z:MYGROUP
> HS2000-1005_167:8:1103:3541:88508     133     chr1    5881857 0       *       
> =       5881857 0       
> GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
>     
> ####################################################################################################
>     BC:Z:0  RG:Z:MYGROUP
> ### END ###
> 
> Thanks!
> 
> 
> 

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to