Is there anyone from the samtools group that can comment on this? We wound up
using FixBAMFile, but sounds like there’s a bug in the bin calculation. It
shouldn’t change by adding @RG tags to the reads (according to the error
itself).
On Aug 20, 2014, at 3:34 PM, Mark Ebbert <[email protected]> wrote:
> Hi,
>
> I posted a similar question on Biostars and realized I should have come here
> to begin with. We received ~1000 whole-genome bams that didn't have the @RG
> tag in the reads (existed in the header though). We used 'bamaddrg' to add
> @RG tags to the reads and are now getting the following error when we use
> Picard's MarkDuplicates:
>
> Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation
> error: ERROR: Record 1642900, Read name HS2000-1005_167:8:1103:3541:88508,
> bin field of BAM record does not equal value computed based on alignment
> start and end, and length of sequence to which read is aligned
> at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452)
> at
> htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:643)
> at
> htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
> at
> htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
> at
> htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514)
> at
> htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
> at
> picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:413)
> at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
> at
> picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
> at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)
>
> Using 'VALIDATION_STRINGENCY=LENIENT' MarkDuplicates will ignore the error,
> but it prints it for a large number of reads (I didn't count how many).
> Another thread on the mailing list
> (http://sourceforge.net/p/samtools/mailman/message/31853465/) says that is
> "bad" and we can use the following command to fix it: 'java -classpath
> sam-1.99.jar net.sf.samtools.FixBAMFile test.bam fixed.bam'
>
> We have so many large bams though.
>
> Questions:
> 1. The error states that the bin is calculated based on alignment start and
> end. These values did not change! So why would the calculated bin change?
> 2. Is there a more manageable way to avoid the incorrect bins while adding
> @RG tags?
>
> Here are two read pairs to compare:
>
> ### BEGIN ###
> HS2000-1005_167:8:1103:3541:88508 73 chr1 5881857 254 100M
> * 0 0
> CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
>
> @?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C
> BC:Z:0 XD:Z:100 SM:i:500 AS:i:0
> HS2000-1005_167:8:1103:3541:88508 133 chr1 5881857 0 *
> = 5881857 0
> GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
>
> ####################################################################################################
> BC:Z:0
>
> HS2000-1005_167:8:1103:3541:88508 73 chr1 5881857 254 100M
> * 0 0
> CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
>
> @?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C
> BC:Z:0 XD:Z:100 SM:i:500 AS:i:0 RG:Z:MYGROUP
> HS2000-1005_167:8:1103:3541:88508 133 chr1 5881857 0 *
> = 5881857 0
> GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
>
> ####################################################################################################
> BC:Z:0 RG:Z:MYGROUP
> ### END ###
>
> Thanks!
>
>
>
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help