[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212850#comment-13212850
]
Li Pi commented on HBASE-4608:
------------------------------
@Kannan - heres the quick overview on 4608:
When writing the HLog, it checks a set of dictionaries for the key, cf,
qualifier, tablename, and regionname. If these items happen to be in the
dictionary, it writes the index, instead of the item. If the item is not in the
dictionary, it is added to the dictionary.
When reading from the HLog, it works in the opposite manner. When it encounters
an uncompressed item, it adds it to the dictionary. If it encounters an index,
it just fetches what it needs from the dictionary.
The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes
per index. (shorts). There is a seperate dictionary for every different field
(e.g. one for tablenames, one for regionnames...).
The dictionary merely must be consistent, if given a bunch of things in a
certain order, it should always assign them the same indices, and always evict
in the exact same fashion.
This seems to work fairly well - and noticeably cuts down our write sizes on
the vast majority of workloads.
> HLog Compression
> ----------------
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
> Issue Type: New Feature
> Reporter: Li Pi
> Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt,
> 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends
> across different datanodes. We can speed up this process by compressing the
> HLog. Current plan involves using a dictionary to compress table name, region
> id, cf name, and possibly other bits of repeated data. Also, HLog format may
> be changed in other ways to produce a smaller HLog.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira