[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212813#comment-13212813
]
[email protected] commented on HBASE-4608:
------------------------------------------------------
bq. On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq. > I tried the compression tool on a log created by YCSB in "load" mode
with the standard dataset. Since the values are fairly large here (100 bytes)
it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%).
But still not bad. I looked at the resulting data using xxd and it looks like
there's still a number of places where we could use variable length integers
instead of non-variable length. I wrote a quick C program to count the number
of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual
table data is all human-readable text in this case, all of the 0x00s should be
able to be compressed away, I think.
bq. >
bq. > I also tested on a YCSB workload where each row has 1000 columns of 4
bytes each (similar to an indexing workload) and the compression ratio was 60%
(64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be
removed.
checked it out. looks like in YCSB workloads the 0x00 bytes are actually
indexes pointing to the 0th entry of the dictionary.
bq. On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq. > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java,
line 52
bq. > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52>
bq. >
bq. > invert the order of these || clauses - otherwise you get an
out-of-bounds just running the tool with no arguments
fixed.
bq. On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq. > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java,
lines 86-88
bq. > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86>
bq. >
bq. > this code doesn't work properly. Here's what you want to do:
bq. >
bq. > Configuration conf = new Configuration();
bq. > FileSystem fs = path.getFileSystem(conf);
bq. >
fixed.
- Li
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
-----------------------------------------------------------
On 2012-02-15 04:57:45, Li Pi wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2740/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-15 04:57:45)
bq.
bq.
bq. Review request for hbase, Eli Collins and Todd Lipcon.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. HLog compression. Has unit tests and a command line tool for
compressing/decompressing.
bq.
bq.
bq. This addresses bug HBase-4608.
bq. https://issues.apache.org/jira/browse/HBase-4608
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
PRE-CREATION
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0
bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
f067221
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
PRE-CREATION
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
d9cd6de
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
cbef70f
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
e1117ef
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
PRE-CREATION
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
23d27fd
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/2740/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Li
bq.
bq.
> HLog Compression
> ----------------
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
> Issue Type: New Feature
> Reporter: Li Pi
> Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt,
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends
> across different datanodes. We can speed up this process by compressing the
> HLog. Current plan involves using a dictionary to compress table name, region
> id, cf name, and possibly other bits of repeated data. Also, HLog format may
> be changed in other ways to produce a smaller HLog.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira