[ https://issues.apache.org/jira/browse/HADOOP-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874576#action_12874576 ]
Michele Catasta commented on HADOOP-5793: ----------------------------------------- You can find a preliminary patch for BMDiff + LZO at http://github.com/pirroh/hadoop-gpl-compression/tree/branch-0.1 I opened an issue on hadoop-gpl-compression as well: http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=27 Questions and feedback more than welcome. > High speed compression algorithm like BMDiff > -------------------------------------------- > > Key: HADOOP-5793 > URL: https://issues.apache.org/jira/browse/HADOOP-5793 > Project: Hadoop Common > Issue Type: New Feature > Reporter: elhoim gibor > Priority: Minor > > Add a high speed compression algorithm like BMDiff. > It gives speeds ~100MB/s for writes and ~1000MB/s for reads, compressing > 2.1billions web pages from 45.1TB in 4.2TB > Reference: > http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=437 > 2005 Jeff Dean talk about google architecture - around 46:00. > http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/ > http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=755678 > A reference implementation exists in HyperTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.