On Tue, 3 Jan 2017, Holbrook J. wrote:

Dear Samtools help members

Happy 2017!  I would be very grateful for your help:

I am trying to manipulate .bam files created by ernebs5 
(http://erne.sourceforge.net) aligning against hg19.
I am running samtools 1.3.1
I have ~ 1.5 mil singleton and ~ 100 mil paired end reads from each sample (the 
singletons are when one of the pair failed QC).
For the singleton alignments, I was able to use Samtools to sort, index, 
filter, index again and calculate coverage relative to my .bed file.
However, I can not sort my paired-end read alignments.
I am running:
samtools sort -T /dev/shm/jostemp -@ 8 -m 4G -o sample1b_paired_sorted.bam 
Sample1b_unmasked.bam

I get an error message that starts:
[bam_sort_core] merging from 16 files……
[E::trans_tbl_add_sq] @SQ SN (chr1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr10) found in binary header but not text header.
…..
There is a line for every chromosome name in my file.

I did a diff for the headers between my singleton .bam file and my paired end 
.bam file and there was no difference for the @SQ lines.

I suspect that the difference is your singleton file is much smaller than the paired ends one. If it's small enough to fit in memory, samtools will sort it all in a single chunk and won't run the merge step (which is where the header check happens). Did you see a "merging from ... files" message for the singletons?

There is enough space in the /dev/shm/jostemp/ path for the temp files and my 
HPC administrator says my resource usage is well under limits.
The flagstat for the input .bam file looks fine to my inexperienced eye and it 
converts fine to a .sam file which again looks OK.

Do the @SQ lines it's complaining about actually exist in your text header? You can check with:

samtools view -H Sample1b_unmasked.bam | grep '^@SQ'

I have posted this problem on BioStars: 
https://www.biostars.org/p/228119/#229683
I’ve also checked out this which seems similar but not the same: 
https://github.com/samtools/samtools/issues/548

Are you sure you're running samtools 1.3.1? The message about "found in binary header but not text header" was removed when issue #548 was fixed, so samtools 1.3.1 shouldn't be able to print the message above. Instead it should silently fix the problem for you.

Ideally, erne should be putting the @SQ lines in the headers itself. If it's not, you might want to get in touch with its developers and ask them to add this feature.

Rob Davies              [email protected]
The Sanger Institute    http://www.sanger.ac.uk/
Hinxton, Cambs.,        Tel. +44 (1223) 834244
CB10 1SA, U.K.          Fax. +44 (1223) 494919


--
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to