Howdy,
I am experiencing a strange problem with samtools view, regarding file
truncation, and I'm hoping someone here can verify my conclusion that
my problem is an error in my file system. I am using samtools-0.1.19
on some variety of x86_64 linux.
What I'm trying to do is take a 36G BAM file, do some filtering, and
output a new BAM file. The filtering program operates on SAM, so I
have a short pipeline that looks like this:
samtools view -h input.bam | filtering_progam | samtools view -bS
- > output.bam
What happens, after about nine hours, is it halts with this report:
[main_samview] truncated file
Unfortunately, since I am running samtools view twice in my command,
this error message does little to inform me about what's happening.
So I modified the source to report more detail, ran for another 9
hours, and after it failed the second time I knew that it was trying
to read BAM input when it failed (and I also know it failed while
reading a 4-byte BAM record header). Which (I believe) rules out the
filtering program, and lack of disk space.
Thinking that perhaps my bam file was indeed truncated or otherwise
corrupted, I ran this command:
samtools view alignments/input.bam | wc -l
That runs to completion, and gives a plausible count (about 266M
records). So it seems like the bam file is OK.
Some other tests with smaller files reveals that samtools detects
truncated BAM files when it opens them. In bam_header_read() it seeks
to the end of the file and validates there is a proper end of file
record. If not, it writes a warning (and proceeds). My job's output
doesn't have this warning, further evidence of a good bam file.
The filtering program periodically reports how many input records it
has seen. The number of records varied a lot between the two 9 hour
failures-- 144M and 194M.
So now I am trying to figure out what could be going wrong that would
fit this evidence. The only thing that comes to mind is some kind of
transient file system error, where it is unable to provide the file
data to the program at some point in time.
--OR-- is there some system resource that both instances of samtools
view are fighting over? Like some temporary file they're both trying
to write? In other words, is it legit to run to instances of samtools
view in one command?
Is there some flaw in my logic above, in my interpretation of the
evidence?
Thanks for any help,
Bob H
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help