[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4608:
-------------------------

    Attachment: 4608v23.txt

Renamed method enableCompression in all places to be setCompressionContext

Made all instances of compression contexts have same name rather than a new 
name every time used.

Cleaned up unused 'compression' data member flag or moved them local from being 
data members when only used by a single method.

Removed define of TRUE and repeat of ENABLE_WAL_COMPRESSION key from
SequenceFileLogReader.  No longer needed.

Rather than have the sequencefile metadata code making sprinkled over the 
reader and writer, instead do all in writer and have reader use write methods.

Added a global WAL type as metadata.

Added a compression type to metadata.

Renamed method WALCompressionEnabled as isWALCompressionEnabled.

Added some small tests to TestLRUDictionary and a new TestCompressor that 
taught me how this stuff works.  Added documentation to methods where I was 
surprised; e.g. addEntry will happily add new entry even though already has 
dictionary entry.

Miscellaneous cleanup.

I ran this compression on one of our production logs and it halved its size.  
See below.  I then decompressed and then recompressed and I got the same size 
back.

{code}
-rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:47 
sv4r25s8%3A60020.1331661889339.out.out.out
-rwxrwxrwx   1 stack  staff  64945799 Mar 13 16:45 
sv4r25s8%3A60020.1331661889339.out.out
-rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:44 
sv4r25s8%3A60020.1331661889339.out
-rw-r--r--   1 stack  staff  64928728 Mar 13 16:25 
sv4r25s8%3A60020.1331661889339
{code}

Will run more of our production logs through the compressor this evening to see 
if I can turn up bugs.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
> 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
> 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to