[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-4608:
-------------------------
Attachment: 4608v23.txt
Renamed method enableCompression in all places to be setCompressionContext
Made all instances of compression contexts have same name rather than a new
name every time used.
Cleaned up unused 'compression' data member flag or moved them local from being
data members when only used by a single method.
Removed define of TRUE and repeat of ENABLE_WAL_COMPRESSION key from
SequenceFileLogReader. No longer needed.
Rather than have the sequencefile metadata code making sprinkled over the
reader and writer, instead do all in writer and have reader use write methods.
Added a global WAL type as metadata.
Added a compression type to metadata.
Renamed method WALCompressionEnabled as isWALCompressionEnabled.
Added some small tests to TestLRUDictionary and a new TestCompressor that
taught me how this stuff works. Added documentation to methods where I was
surprised; e.g. addEntry will happily add new entry even though already has
dictionary entry.
Miscellaneous cleanup.
I ran this compression on one of our production logs and it halved its size.
See below. I then decompressed and then recompressed and I got the same size
back.
{code}
-rwxrwxrwx 1 stack staff 28540761 Mar 13 16:47
sv4r25s8%3A60020.1331661889339.out.out.out
-rwxrwxrwx 1 stack staff 64945799 Mar 13 16:45
sv4r25s8%3A60020.1331661889339.out.out
-rwxrwxrwx 1 stack staff 28540761 Mar 13 16:44
sv4r25s8%3A60020.1331661889339.out
-rw-r--r-- 1 stack staff 64928728 Mar 13 16:25
sv4r25s8%3A60020.1331661889339
{code}
Will run more of our production logs through the compressor this evening to see
if I can turn up bugs.
> HLog Compression
> ----------------
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
> Issue Type: New Feature
> Reporter: Li Pi
> Assignee: stack
> Fix For: 0.94.0
>
> Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt,
> 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt,
> 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends
> across different datanodes. We can speed up this process by compressing the
> HLog. Current plan involves using a dictionary to compress table name, region
> id, cf name, and possibly other bits of repeated data. Also, HLog format may
> be changed in other ways to produce a smaller HLog.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira