[ https://issues.apache.org/jira/browse/HADOOP-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875792#action_12875792 ]
Luke Lu commented on HADOOP-5793: --------------------------------- The bmz c api is designed to allow combining 2 different algorithms efficiently (minimize copies/allocations), so it would be nice to allow native compositions like bm_pack/unpack + zlib/lzma/zpaq/fastlz without having to cross the jni chasm. You also probably want to set bmz_set_out/die_proc to override the logging/error handling behavior. Note, the bmz code is experimental (there are a lot of alternative and benchmarking code in it) and I have quite a few tweaks queued up to cleanup/cut down the code size while improving speed. Any feedback or bug reports so far regarding the c-code itself? > High speed compression algorithm like BMDiff > -------------------------------------------- > > Key: HADOOP-5793 > URL: https://issues.apache.org/jira/browse/HADOOP-5793 > Project: Hadoop Common > Issue Type: New Feature > Reporter: elhoim gibor > Assignee: Michele (@pirroh) Catasta > Priority: Minor > > Add a high speed compression algorithm like BMDiff. > It gives speeds ~100MB/s for writes and ~1000MB/s for reads, compressing > 2.1billions web pages from 45.1TB in 4.2TB > Reference: > http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=437 > 2005 Jeff Dean talk about google architecture - around 46:00. > http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/ > http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=755678 > A reference implementation exists in HyperTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.