Did you see the top comments in bmz.c:

/**
 * An effective/efficient block compressor for input containing long
common
 * strings (e.g. web pages from a website)
 *
 * cf. Bentley & McIlroy, "Data Compression Using Long Common
Strings", 1999
 * cf. BMDiff & Zippy mentioned in the Bigtable paper
 */

The B&M paper is available online if you search for it. BMZ by default
is essentially the BM algorithm plus LZO. But the library is flexible
enough allow other combinations.

On Mar 14, 4:02 pm, Mateusz Berezecki <[email protected]> wrote:
> I've been trying to figure out what kind of compression algorithm is
> BMZ and failed. So could someone please give me some references or
> pointers to literature (can be online) to the BMZ algorithm
> explanation, etc?
>
> The second thought I had was to ask if LZMA was considered for
> compression? What was the original criterion for selecting supported
> compression algorithms?

The main criteria is the throughput for encode/decode typical commit
log and cellstore blocks (default compressed block size is 64KB, about
100-200KB raw size). LZMA (much slower than bzip2, which is much
slower than gzip, which is much slower than bmz and lzo) and bzip2 are
considered too slow and their data compression advantage is not that
big for relatively small blocks as both LZMA and bzip2 take advantage
of large (many MBs) buffers. Of course, you're welcome to experiment
with other compression options (I hope our BlockCompressionCodec API
is easy enough for you to extend :)

My BM implementation is experimental (but seems stable enough from
random tests) in nature and hardly tuned (except for avoiding using
modulo in Rabin-Karp hash table lookups), I think profiling and tuning
would make it a lot faster (it's already about 4-5x faster than gzip
on various input.)

__Luke
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to