Thanks John, for the clarification! I realized that I misspelt your surname 
also, apologies.  Then I can safely continue to use my small program that 
removes EOF blocks from bed bgzip archives (without headers).  

Anyone who knows whether this bug in htslib will be fixed?

-Pär


-----Original Message-----
From: John Marshall [mailto:[email protected]] 
Sent: den 5 oktober 2015 09:48
To: Pär Larsson <[email protected]>
Cc: [email protected]
Subject: Re: [Samtools-help] bgzip removal of EOF blocks?

On 3 Oct 2015, at 23:52, Pär Larsson <[email protected]> wrote:
> Sorry to bother you with a question relating to an older discussion 
> (http://sourceforge.net/p/samtools/mailman/message/34109200/). John Marshall 
> expressed concerns that removal of the 28 byte EOF block from bgzip archives 
> would be unreliable for catting archives together using the 'cat' command.

Actually I expanded on concerns with *not* removing the 28 byte EOF block.  In 
principle you shouldn't even need to remove it, but bugs in current versions of 
tools mean that you do, as Stathis had found.  Removing the 28 bytes enables 
you to produce a catted file that looks identical to one that was written all 
at once, which will be fine.

> Seems to work when I try (using bgzipped bed files) although tabix indices 
> and archive sizes become different.  It would save time if it could be done 
> this way so I'm just curious if anyone might know when and how it could fail. 
> Silently?

The other obvious issue with catting these files together is that you need to 
make sure that headers of the second and subsequent files don't cause trouble, 
as they will now be embedded in the concatenated file rather than at the 
beginning.  Stathis avoided this by removing headers from all but the first 
input VCF file.  Embedded headers may be acceptable in BED files depending on 
the tools you're using, so you may be fine -- but in general it's something to 
consider carefully.

    John

--
 The Wellcome Trust Sanger Institute is operated by Genome Research  Limited, a 
charity registered in England with number 1021457 and a  company registered in 
England with number 2742969, whose registered  office is 215 Euston Road, 
London, NW1 2BE. 

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to