@cblake,

Thank you for all of the detailed comparisons, and thanks for pointing out the 
counter bug. I did catch that yesterday, but it didn't make a meaningful 
difference for the timing. I plan to update it in my original post.

Thanks for pointing me towards `Zstd`. That's remarkable, and something the 
genomics/bioinformatics field will greatly benefit from. We generate a lot of 
data using very verbose file formats. The [.sam/bam 
files](https://samtools.github.io/hts-specs/SAMv1.pdf) store raw sequencing 
data. The `bam` file is a `BGZF` of the `sam` file that is supposed to be good 
for fast random access. I wonder how `Zstd` compares to `BGZF` in both 
compression and access?

Also, looking at the `Zstd` GitHub page, I noticed it is capable of encoding 
and decoding `.gz` files. I have never delved into different compression 
formats, but is the `.gz` format itself inherently different from `.zs`? i.e., 
Could `Zstd` generate a `.gz` file at prime compression that can also be 
decoded by `gzip`? I'm guessing there must be an inherent difference. In either 
case, I might start pressing for our field to use `Zstd`. As you mentioned, 
de-/compressing certain files can be painfully slow.

Reply via email to