[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131940#comment-13131940 ]
Todd Lipcon commented on HBASE-4608: ------------------------------------ One quick sketch of how this might work: {code} interface CompressionDictionary { public byte[] getEntry(int idx); public int findEntry(byte[] data); public int addEntry(byte[] data); } {code} while writing: start each HLog with an empty CompressionDictionary: {code} void writeString(byte[] data) { int dictIdx = dict.findEntry(data); if (dictIdx == -1) { // not in dict writeByte(0x00); WritableUtils.writeString(data); // current implementation } else { writeInt((1 << 31) | dictIdx); } } {code} while reading: {code} byte[] readString(in) { in.mark(); byte firstbyte = in.read(); if (firstbyte & (1 << 31)) { in.reset(); int dictidx = in.readInt() & ~(1 << 31); return dict.getEntry(dictidx); } else { assert firstbyte == 0; byte[] ret = WritableUtils.readString(); dict.addEntry(ret); } } {code} then the dictionary could be implemented as a fixed size associative hash... maybe a cuckoo hash or something exotic (they're on my mind since reading the SILT paper last week) > HLog Compression > ---------------- > > Key: HBASE-4608 > URL: https://issues.apache.org/jira/browse/HBASE-4608 > Project: HBase > Issue Type: New Feature > Reporter: Li Pi > Assignee: Li Pi > > The current bottleneck to HBase write speed is replicating the WAL appends > across different datanodes. We can speed up this process by compressing the > HLog. Current plan involves using a dictionary to compress table name, region > id, cf name, and possibly other bits of repeated data. Also, HLog format may > be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira