On Tue, 3 Jan 2017, Holbrook J. wrote:
Dear Samtools help members
Happy 2017! I would be very grateful for your help:
I am trying to manipulate .bam files created by ernebs5
(http://erne.sourceforge.net) aligning against hg19.
I am running samtools 1.3.1
I have ~ 1.5 mil singleton and ~ 100 mil paired end reads from each sample (the
singletons are when one of the pair failed QC).
For the singleton alignments, I was able to use Samtools to sort, index,
filter, index again and calculate coverage relative to my .bed file.
However, I can not sort my paired-end read alignments.
I am running:
samtools sort -T /dev/shm/jostemp -@ 8 -m 4G -o sample1b_paired_sorted.bam
Sample1b_unmasked.bam
I get an error message that starts:
[bam_sort_core] merging from 16 files……
[E::trans_tbl_add_sq] @SQ SN (chr1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr10) found in binary header but not text header.
…..
There is a line for every chromosome name in my file.
I did a diff for the headers between my singleton .bam file and my paired end
.bam file and there was no difference for the @SQ lines.
I suspect that the difference is your singleton file is much smaller than
the paired ends one. If it's small enough to fit in memory, samtools will
sort it all in a single chunk and won't run the merge step (which is where
the header check happens). Did you see a "merging from ... files" message
for the singletons?
There is enough space in the /dev/shm/jostemp/ path for the temp files and my
HPC administrator says my resource usage is well under limits.
The flagstat for the input .bam file looks fine to my inexperienced eye and it
converts fine to a .sam file which again looks OK.
Do the @SQ lines it's complaining about actually exist in your text
header? You can check with:
samtools view -H Sample1b_unmasked.bam | grep '^@SQ'
I have posted this problem on BioStars:
https://www.biostars.org/p/228119/#229683
I’ve also checked out this which seems similar but not the same:
https://github.com/samtools/samtools/issues/548
Are you sure you're running samtools 1.3.1? The message about "found in
binary header but not text header" was removed when issue #548 was fixed,
so samtools 1.3.1 shouldn't be able to print the message above. Instead
it should silently fix the problem for you.
Ideally, erne should be putting the @SQ lines in the headers itself. If
it's not, you might want to get in touch with its developers and ask them
to add this feature.
Rob Davies [email protected]
The Sanger Institute http://www.sanger.ac.uk/
Hinxton, Cambs., Tel. +44 (1223) 834244
CB10 1SA, U.K. Fax. +44 (1223) 494919
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help