[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212850#comment-13212850
 ] 

Li Pi commented on HBASE-4608:
------------------------------

@Kannan - heres the quick overview on 4608:

When writing the HLog, it checks a set of dictionaries for the key, cf, 
qualifier, tablename, and regionname. If these items happen to be in the 
dictionary, it writes the index, instead of the item. If the item is not in the 
dictionary, it is added to the dictionary.

When reading from the HLog, it works in the opposite manner. When it encounters 
an uncompressed item, it adds it to the dictionary. If it encounters an index, 
it just fetches what it needs from the dictionary. 

The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes 
per index. (shorts). There is a seperate dictionary for every different field 
(e.g. one for tablenames, one for regionnames...). 

The dictionary merely must be consistent, if given a bunch of things in a 
certain order, it should always assign them the same indices, and always evict 
in the exact same fashion.


This seems to work fairly well - and noticeably cuts down our write sizes on 
the vast majority of workloads.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 
> 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to