[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231009#comment-13231009 ] Lars Francke commented on HBASE-4608: - This seems to be missing documentation, no? Shouldn't the hbase.regionserver.wal.enablecompression key at least be in hbase-default.xml? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230360#comment-13230360 ] Zhihong Yu commented on HBASE-4608: --- w.r.t. Lars' comment: https://issues.apache.org/jira/browse/HBASE-4608?focusedCommentId=13229010page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229010 I think it makes sense. How about introducing an enum CompressionType with values of NONE and DICTIONARY ? HConstants.ENABLE_WAL_COMPRESSION would be replaced by another String: hbase.regionserver.wal.compressiontype If hbase.regionserver.wal.compressiontype doesn't appear in conf, CompressionType.NONE is assumed. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230449#comment-13230449 ] stack commented on HBASE-4608: -- I'll commit v30 then. Thanks all for reviews, etc. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230456#comment-13230456 ] stack commented on HBASE-4608: -- Now this is in, does that mean we can cut a 0.94RC0? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230490#comment-13230490 ] Lars Hofhansl commented on HBASE-4608: -- Yeah! And, yes, time for an RC. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230493#comment-13230493 ] Hudson commented on HBASE-4608: --- Integrated in HBase-0.94 #32 (See [https://builds.apache.org/job/HBase-0.94/32/]) HBASE-4608 HLog Compression (Revision 1301167) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230500#comment-13230500 ] Li Pi commented on HBASE-4608: -- Woohoo! It's in! HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230538#comment-13230538 ] Hudson commented on HBASE-4608: --- Integrated in HBase-TRUNK #2683 (See [https://builds.apache.org/job/HBase-TRUNK/2683/]) HBASE-4608 HLog Compression (Revision 1301165) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230914#comment-13230914 ] Hudson commented on HBASE-4608: --- Integrated in HBase-TRUNK-security #139 (See [https://builds.apache.org/job/HBase-TRUNK-security/139/]) HBASE-4608 HLog Compression (Revision 1301165) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229022#comment-13229022 ] stack commented on HBASE-4608: -- @Lars Generalizing the compression done here is out of scope for this issue. The patch was not written that way from the get go. The reviews done up to like v22odd made no mention of supporting other compression types. I'd suggest we do it in another issue if and when its wanted. Let me put v27 up on rb. bq. I forget, do we also SNAPPY/LZO/GZ compress the HLogs? We don't do this because these compression algorithms work in blocks of 32k or so. If not tied off probably on the end we could lose up to 32k of edits. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229024#comment-13229024 ] stack commented on HBASE-4608: -- Lars, I can't upload a patch to someone else's issue. Made a new rb at https://reviews.apache.org/r/4328/ HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229027#comment-13229027 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ --- Review request for hbase. Summary --- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs - src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing --- Thanks, Michael HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229032#comment-13229032 ] Lars Hofhansl commented on HBASE-4608: -- @Stack: fair enough. Let's get this one done. +1 on generalization only when needed and in another jira. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229051#comment-13229051 ] Todd Lipcon commented on HBASE-4608: btw, +1 on this new patch after you've double-checked with your logs and run it through the full suite. Lars, did you want to take a look tomorrow before it's committed? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229054#comment-13229054 ] stack commented on HBASE-4608: -- I like your changes Todd. Nice fixup. Lars, let me post v28 for you up on rb. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229055#comment-13229055 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ --- (Updated 2012-03-14 07:34:58.002687) Review request for hbase. Changes --- Uploading v28 for lars to take a looksee Summary --- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing --- Thanks, Michael HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229064#comment-13229064 ] stack commented on HBASE-4608: -- I reran compress, decompress, compress cycle over my 40 odd random WALs from prod and seems fine w/ v28. Sizes look right. No errors. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229069#comment-13229069 ] Hadoop QA commented on HBASE-4608: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518303/hbase-4608-v28.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//console This message is automatically generated. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229080#comment-13229080 ] Li Pi commented on HBASE-4608: -- @Stack nvm, just read upwards. That's inline with the other results by Todd and I. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229137#comment-13229137 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5929 --- src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java https://reviews.apache.org/r/4328/#comment12894 Introducing enum is a good idea. I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java https://reviews.apache.org/r/4328/#comment12895 How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java https://reviews.apache.org/r/4328/#comment12891 Hiding LRUDictionary.class is desirable. Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java https://reviews.apache.org/r/4328/#comment12892 We introduced compression type in Metadata, how about allowing user to specify compression type using conf ? Default is dictionary compression. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java https://reviews.apache.org/r/4328/#comment12893 Hiding LRUDictionary.class is desirable. How about passing conf to CompressionContext ctor ? - Ted On 2012-03-14 07:34:58, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 07:34:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4328/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Michael bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name,
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229248#comment-13229248 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- bq. On 2012-03-14 11:46:10, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53 bq. https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53 bq. bq. Introducing enum is a good idea. bq. I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar. HLogKey does not need to know about 'type' of compression. bq. On 2012-03-14 11:46:10, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 306 bq. https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line306 bq. bq. How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ? Generalization is out of scope. bq. On 2012-03-14 11:46:10, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189 bq. https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189 bq. bq. Hiding LRUDictionary.class is desirable. bq. Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ? Out of scope. bq. On 2012-03-14 11:46:10, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 110 bq. https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line110 bq. bq. We introduced compression type in Metadata, how about allowing user to specify compression type using conf ? bq. Default is dictionary compression. Customization is out of scope. How about... should have attendant justification. You can justify generalization of this compression in a new jira. bq. On 2012-03-14 11:46:10, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 139 bq. https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line139 bq. bq. Hiding LRUDictionary.class is desirable. bq. How about passing conf to CompressionContext ctor ? The generalization that would require hiding the type of compression being done is out of scope. This is not a software project that fellas are working on for casual amusement. New facility should be justified by real-world needs. This feature is experimental. It could help w/ our WAL writes. It may not. We need to get a basic facility into a release so we can try it. If it proves its worth, we can spend more time down this avenue. - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5929 --- On 2012-03-14 07:34:58, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 07:34:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq.
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229263#comment-13229263 ] Li Pi commented on HBASE-4608: -- +1 from here. Agree w/ Stack. Compression can be generalized later. We can just bump up the version in that case. Right now, this works, passes tests, and provides a very substantial improvement in certain cases. (See Stack's workload). HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229277#comment-13229277 ] Zhihong Yu commented on HBASE-4608: --- bq. HLogKey does not need to know about 'type' of compression. I agree. But see this code: {code} + Compressor.writeCompressed(this.encodedRegionName, 0, + this.encodedRegionName.length, out, + compressionContext.regionDict); {code} HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229281#comment-13229281 ] Li Pi commented on HBASE-4608: -- That code is just writing the output for the regionname, using the regiondict. I guess if the dictionary behavior were to change, it could be problematic. But when we have more than 1 dictionary, we can deal with it then. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229292#comment-13229292 ] Li Pi commented on HBASE-4608: -- Not that we'd compress a random value well at all anyways. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229291#comment-13229291 ] Li Pi commented on HBASE-4608: -- Also, figured out why Ted's benchmarks differed from the rest of ours. PE tool tests with random writes to million rows, each row has a single column whose value is 1000 randomly-generated byte. This is pretty difficult to compress. The number of rows means that rownames won't fit in the dictionary, and we don't compress values yet. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229294#comment-13229294 ] Zhihong Yu commented on HBASE-4608: --- bq. we don't compress values yet. Looks like we have something to do in V2 :-) HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229420#comment-13229420 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 --- Ship it! Some comments and nits inside. Some extraneous whitespace (can be fixed at commit). src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/4328/#comment12915 Nit: Comment here that the status byte is the higher order byte of the dict entry. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/4328/#comment12916 I assume we're entirely sure that a dictionary will never have 2^15 entries. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/4328/#comment12914 Nit: The naming convention is a bit strange. This one is called uncompress... whereas the method returning a new byte[] is called readCompressed src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java https://reviews.apache.org/r/4328/#comment12917 Have a constructor that takes a compression context too? It seems like once anything has been written to the HLog this should be immutable. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java https://reviews.apache.org/r/4328/#comment12919 COMPRESSED is a bit of a strange name. I happens to be a version of the WAL that supports compression, but it is not necessarily compressed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java https://reviews.apache.org/r/4328/#comment12920 ugly whitespace :) src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java https://reviews.apache.org/r/4328/#comment12921 I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case? I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java https://reviews.apache.org/r/4328/#comment12922 I'll trust you folks that a PriorityQueue would not work here. - Lars On 2012-03-14 07:34:58, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 07:34:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4328/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Michael bq. bq. HLog Compression Key: HBASE-4608 URL:
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229432#comment-13229432 ] Zhihong Yu commented on HBASE-4608: --- I just thought we should encapsulate LRUDictionary in CompressionContext: {code} +boolean compression = reader.isWALCompressionEnabled(); +if (compression) { + try { +if (compressionContext == null) { + compressionContext = new CompressionContext(LRUDictionary.class); {code} In my opinion CompressionContext shouldn't just be a holder of multiple dictionaries. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229448#comment-13229448 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- bq. On 2012-03-14 17:42:21, Lars Hofhansl wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32 bq. https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32 bq. bq. I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case? bq. I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all. 65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes. If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes. - Li --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 --- On 2012-03-14 07:34:58, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 07:34:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4328/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Michael bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229454#comment-13229454 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- bq. On 2012-03-14 17:42:21, Lars Hofhansl wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108 bq. https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108 bq. bq. I assume we're entirely sure that a dictionary will never have 2^15 entries. It'll start evicting once it hits its max size, which is currently 2 ^ 15. - Li --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 --- On 2012-03-14 07:34:58, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 07:34:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4328/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Michael bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229453#comment-13229453 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- bq. On 2012-03-14 17:42:21, Lars Hofhansl wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32 bq. https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32 bq. bq. I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case? bq. I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all. bq. bq. Li Pi wrote: bq. 65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes. bq. bq. If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes. Actually halve those amounts, 2^15, not 2^16. - Li --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 --- On 2012-03-14 07:34:58, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 07:34:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4328/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Michael bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229663#comment-13229663 ] stack commented on HBASE-4608: -- Li asked me lzma some of my logs from the wild. I did. W/ lzma --best, it compresses down to 12% of size. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229678#comment-13229678 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- bq. On 2012-03-14 11:46:10, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53 bq. https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53 bq. bq. Introducing enum is a good idea. bq. I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar. bq. bq. Michael Stack wrote: bq. HLogKey does not need to know about 'type' of compression. Adding comments around the versions to give some context on why enums are named so. bq. On 2012-03-14 11:46:10, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189 bq. https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189 bq. bq. Hiding LRUDictionary.class is desirable. bq. Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ? bq. bq. Michael Stack wrote: bq. Out of scope. Yeah, adding a factory to choose between different compression context types when we have only one compression type available is out of scope for this issue. - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5929 --- On 2012-03-14 07:34:58, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 07:34:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4328/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Michael bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229771#comment-13229771 ] stack commented on HBASE-4608: -- Here's some WALs to compared compressed w/ patch v29 vs lzma and then the dictionary compressed file itself lzma'd (Todd request). LZMA'ing the dictionary compressed file makes it smaller than the lzma'd original. lzma'ing the compressed file makes it 1/4 size of dictionary compressed file (roughly). I didn't get a chance to lzo it {code} -rw-r--r-- 1 stack staff 64589199 Mar 13 20:24 sv4r21s12%3A60020.1331685637452 -rwxrwxrwx 1 stack staff 28906432 Mar 14 15:34 sv4r21s12%3A60020.1331685637452.compressed -rw-r--r-- 1 stack staff 7417213 Mar 14 16:25 sv4r21s12%3A60020.1331685637452.compressed.lzma -rw-r--r-- 1 stack staff 8511618 Mar 14 16:24 sv4r21s12%3A60020.1331685637452.lzma -rw-r--r-- 1 stack staff 63755620 Mar 13 20:24 sv4r21s12%3A60020.1331687005652 -rwxrwxrwx 1 stack staff 28804928 Mar 14 15:34 sv4r21s12%3A60020.1331687005652.compressed -rw-r--r-- 1 stack staff 6866107 Mar 14 16:28 sv4r21s12%3A60020.1331687005652.compressed.lzma -rw-r--r-- 1 stack staff 8328771 Mar 14 16:27 sv4r21s12%3A60020.1331687005652.lzma -rw-r--r-- 1 stack staff 63755688 Mar 13 20:24 sv4r21s12%3A60020.1331688224458 -rwxrwxrwx 1 stack staff 27701052 Mar 14 15:34 sv4r21s12%3A60020.1331688224458.compressed -rw-r--r-- 1 stack staff 6614637 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.compressed.lzma -rw-r--r-- 1 stack staff 8462991 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.lzma -rw-r--r-- 1 stack staff 64024836 Mar 13 20:24 sv4r21s12%3A60020.1331689518188 -rwxrwxrwx 1 stack staff 28851435 Mar 14 15:34 sv4r21s12%3A60020.1331689518188.compressed -rw-r--r-- 1 stack staff 6677112 Mar 14 16:35 sv4r21s12%3A60020.1331689518188.compressed.lzma -rw-r--r-- 1 stack staff 8158847 Mar 14 16:34 sv4r21s12%3A60020.1331689518188.lzma -rw-r--r-- 1 stack staff 63757131 Mar 13 20:24 sv4r21s12%3A60020.1331690608900 -rwxrwxrwx 1 stack staff 28201506 Mar 14 15:34 sv4r21s12%3A60020.1331690608900.compressed -rw-r--r-- 1 stack staff 6941982 Mar 14 16:38 sv4r21s12%3A60020.1331690608900.compressed.lzma -rw-r--r-- 1 stack staff 8513895 Mar 14 16:37 sv4r21s12%3A60020.1331690608900.lzma -rw-r--r-- 1 stack staff 63754114 Mar 13 20:24 sv4r21s12%3A60020.1331691711502 -rwxrwxrwx 1 stack staff 28318314 Mar 14 15:34 sv4r21s12%3A60020.1331691711502.compressed -rw-r--r-- 1 stack staff 7392701 Mar 14 16:42 sv4r21s12%3A60020.1331691711502.compressed.lzma -rw-r--r-- 1 stack staff 9136798 Mar 14 16:41 sv4r21s12%3A60020.1331691711502.lzma -rw-r--r-- 1 stack staff 63756667 Mar 13 20:24 sv4r21s12%3A60020.1331692886725 -rwxrwxrwx 1 stack staff 28309792 Mar 14 15:34 sv4r21s12%3A60020.1331692886725.compressed -rw-r--r-- 1 stack staff 7139965 Mar 14 16:44 sv4r21s12%3A60020.1331692886725.compressed.lzma -rw-r--r-- 1 stack staff 8968155 Mar 14 16:43 sv4r21s12%3A60020.1331692886725.lzma -rw-r--r-- 1 stack staff 63755003 Mar 13 20:24 sv4r21s12%3A60020.1331694049033 -rwxrwxrwx 1 stack staff 28127053 Mar 14 15:35 sv4r21s12%3A60020.1331694049033.compressed -rw-r--r-- 1 stack staff 6498486 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.compressed.lzma -rw-r--r-- 1 stack staff 8175618 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.lzma -rw-r--r-- 1 stack staff 23441144 Mar 13 20:24 sv4r21s12%3A60020.1331695045194 -rwxrwxrwx 1 stack staff 10561645 Mar 14 15:35 sv4r21s12%3A60020.1331695045194.compressed -rw-r--r-- 1 stack staff 2922204 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.compressed.lzma -rw-r--r-- 1 stack staff 3228837 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.lzma {code} HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229773#comment-13229773 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5972 --- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java https://reviews.apache.org/r/4328/#comment12946 IllegalArgumentException is not needed here. I removed it, compiled and ran TestCompressor - it passed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/4328/#comment12947 A closing ) should be placed either on this line or on line 109. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/4328/#comment12948 Should read 'byte of index to the ...' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/4328/#comment12949 Should read 'an array of bytes' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/4328/#comment12950 Please add javadoc for offset and length. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java https://reviews.apache.org/r/4328/#comment12958 Should we label this class @InterfaceAudience.Private ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java https://reviews.apache.org/r/4328/#comment12951 I don't quite get what the second sentence is supposed to convey ? It seems to be same as first sentence. This version is the minimum version that supports compression. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java https://reviews.apache.org/r/4328/#comment12952 A (slightly) long line. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java https://reviews.apache.org/r/4328/#comment12954 Long line. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java https://reviews.apache.org/r/4328/#comment12955 Can we remove 'silly' here ? Some user may actually reach this size. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java https://reviews.apache.org/r/4328/#comment12956 'initiate' is used to start an action or message. 'initialize' should be used here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java https://reviews.apache.org/r/4328/#comment12957 Setting reader to null would be desirable after the close() call. - Ted On 2012-03-14 22:26:34, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 22:26:34) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq.
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229776#comment-13229776 ] Lars Hofhansl commented on HBASE-4608: -- I'm still +1 :) The lzma number are interesting. Maybe a nice (future) solution would be to dictionary compress the HLog while writing, and then when the log is rolled compress it with lzma (since we know the file won't change any more we can compress it wholesale). This begs the next question: What portion of the WAL storage do the current WALs represent? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229826#comment-13229826 ] Hadoop QA commented on HBASE-4608: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518388/4608v29.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//console This message is automatically generated. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229871#comment-13229871 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 164 bq. https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line164 bq. bq. Please add javadoc for offset and length. Are you joking? On a protected method with parameter names such as these that follow a byte array argument? bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 36 bq. https://reviews.apache.org/r/4328/diff/3/?file=92427#file92427line36 bq. bq. IllegalArgumentException is not needed here. bq. I removed it, compiled and ran TestCompressor - it passed. Removed. bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108 bq. https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line108 bq. bq. A closing ) should be placed either on this line or on line 109. done bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 143 bq. https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line143 bq. bq. Should read 'byte of index to the ...' done bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 55 bq. https://reviews.apache.org/r/4328/diff/3/?file=92431#file92431line55 bq. bq. I don't quite get what the second sentence is supposed to convey ? bq. It seems to be same as first sentence. bq. bq. This version is the minimum version that supports compression. Leaving as is. The second sentence is to emphasize that only the dictionary compression was introduced in version -2. bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 33 bq. https://reviews.apache.org/r/4328/diff/3/?file=92433#file92433line33 bq. bq. Can we remove 'silly' here ? bq. Some user may actually reach this size. Then they are being silly. bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 202 bq. https://reviews.apache.org/r/4328/diff/3/?file=92434#file92434line202 bq. bq. Setting reader to null would be desirable after the close() call. Done - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5972 --- On 2012-03-14 22:26:34, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 22:26:34) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq.
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229877#comment-13229877 ] stack commented on HBASE-4608: -- @Ted Would suggest that in future you not piecemeal in your reviews. Bulk them up. When review comes in in dribs and drabs, the whole process takes way longer. @Lars What portion of the WAL storage do the current WALs represent? Do you mean, how much of our footprint is comprised of WAL logs? Not sure. I thought intent of this issue was to speed syncs because there'd be less bytes to shuttle across the datanode replicas pipeline. I'm not wondering if this patch is worth adding? If compressible stuff is only shrinking by half, is that big enough win? What do you lot thing? LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size. Let me try adding lzo numbers but we wouldn't want to use lzo anyways because we could lose a bunch of edits off the end if the compression block was not closed off (Thats my understanding. I could be wrong). Li, what happens if we cut the end off a dictionary-compressed file. Will we be able to read up to the last byte or word or so? Good stuff. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229878#comment-13229878 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- bq. On 2012-03-14 23:54:37, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java, line 28 bq. https://reviews.apache.org/r/4328/diff/3/?file=92429#file92429line28 bq. bq. Should we label this class @InterfaceAudience.Private ? Unless a class is public, it doesn't need an interface audience annotation - Todd --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5972 --- On 2012-03-14 22:26:34, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4328/ bq. --- bq. bq. (Updated 2012-03-14 22:26:34) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. See issue bq. bq. bq. This addresses bug hbase-4608. bq. https://issues.apache.org/jira/browse/hbase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4328/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Michael bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229884#comment-13229884 ] Zhihong Yu commented on HBASE-4608: --- w.r.t. adding javadoc for offset and length of writeCompressed(), I searched our code base for '@param offset ' and found 48 occurrences. I like this snippet from HFileReaderV2.java: {code} * @param key key byte array * @param offset key offset in the key byte array * @param length key length {code} Even an empty javadoc is better than missing parameter: {code} * @param offset {code} HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229890#comment-13229890 ] Hadoop QA commented on HBASE-4608: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518416/4608v30.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMetaScanner Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//console This message is automatically generated. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229893#comment-13229893 ] Lars Hofhansl commented on HBASE-4608: -- bq. I'm not wondering if this patch is worth adding? If compressible stuff is only shrinking by half, is that big enough win? What do you lot thing? LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size. You mean you are *now* wondering? :) IMHO: The WAL is probably the greatest source of synchronous IO that we generate, cutting this in half seems quite valuable (maybe this will be less valuable in the future if/when HDFS can do parallel replication instead of chaining - but it is now). I agree that none of the block based compression schemes would be good options... Was merely curious about HLog archiving, which is quite unrelated to this issue. +1, let's commit this. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229913#comment-13229913 ] Li Pi commented on HBASE-4608: -- If a dictionary file gets cut up, you'll be able to read all the way to the end. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229914#comment-13229914 ] Li Pi commented on HBASE-4608: -- If a dictionary file gets cut up, you'll be able to read all the way to the end. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229921#comment-13229921 ] Li Pi commented on HBASE-4608: -- On other compression things. I looked into those. plugging into LZMA was the first thing I thought about doing - performance stops this one though. There are other optimization we can make, such as modifying the dictionary to take into account frequency, and assigning the highest probability entries to the lowest numbers, then using vints rather than 2 bytes for everything. Note that we shouldn't be able to beat LZMA, because we neither compress values, nor do we compress the SequenceFile overhead. On some workloads, those overheads might be substantial - although I haven't checked. This is actually pretty close to the challenge displayed by caching, in that we want to keep the most likely to be repeated entries in our dictionary, and evict the rest. I used LRU because LRU was simple, and like caching, pretty much anything results in a substantial performance increase over nothing. I'm pretty happy with cutting the WAL size in half on optimal workloads, though as always, it's nice to work towards future performance goals. I have other ideas, but they involve changing the HLog substantially in order to be more compact. In that case, we might end up abandoning the Hadoop Sequencefile format altogether, and this thing becomes a bit more complex. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228842#comment-13228842 ] Zhihong Yu commented on HBASE-4608: --- In isWALCompressionEnabled(): {code} +if (txt == null || Integer.parseInt(txt.toString()) VERSION) return false; {code} What would happen when we have a newer version for WAL_VERSION_KEY ? Looks like the following check should suffice for isWALCompressionEnabled(): {code} +txt = metadata.get(WAL_COMPRESSION_TYPE_KEY); +return txt != null txt.equals(DICTIONARY_COMPRESSION_TYPE); {code} HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228931#comment-13228931 ] Hadoop QA commented on HBASE-4608: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518270/4608v23.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//console This message is automatically generated. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228946#comment-13228946 ] stack commented on HBASE-4608: -- Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end: {code} -rw-r--r--1 stack staff 64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339 -rwxrwxrwx1 stack staff 28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed -rwxrwxrwx1 stack staff 28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again -rwxrwxrwx1 stack staff 28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again -rwxrwxrwx1 stack staff 64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed -rwxrwxrwx1 stack staff 64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again {code} Its 44% of original size. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228947#comment-13228947 ] stack commented on HBASE-4608: -- Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end: {code} -rw-r--r--1 stack staff 64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339 -rwxrwxrwx1 stack staff 28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed -rwxrwxrwx1 stack staff 28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again -rwxrwxrwx1 stack staff 28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again -rwxrwxrwx1 stack staff 64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed -rwxrwxrwx1 stack staff 64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again {code} Its 44% of original size. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228948#comment-13228948 ] Zhihong Yu commented on HBASE-4608: --- The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23. Let me elaborate more on my comment @ 14/Mar/12 00:26 As you described, we would use a new constant (COMPRESSION_VERSION) to represent the minimum version that supports dictionary compression. In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37 Say we later introduce prefix compression, we would introduce another constant representing the minimum version supporting prefix compression. I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version. Regards HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228950#comment-13228950 ] Zhihong Yu commented on HBASE-4608: --- I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339 HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228954#comment-13228954 ] Zhihong Yu commented on HBASE-4608: --- Please wrap long line: {code} + public static final String ENABLE_WAL_COMPRESSION = hbase.regionserver.wal.enablecompression; {code} w.r.t. the following code: {code} + static final int VERSION = COMPRESSION_VERSION; + static final Text WAL_VERSION = new Text( + VERSION); {code} It implies that WAL_VERSION is the same as COMPRESSION_VERSION. As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION e.g. Then we face a choice: what value would WAL_VERSION carry ? I propose naming COMPRESSION_VERSION above DICTIONARY_COMPRESSION_VERSION and decouple it from WAL_VERSION. In the future, WAL_VERSION of 2 can carry either dictionary or prefix compression. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228957#comment-13228957 ] stack commented on HBASE-4608: -- bq. I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339 Because it has metadata the original doesn't have. When I compress it, it compresses down to same size. Notice that the decompressed and decompressed.again are same size because they both have the new meata data. bq. The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23. Pardon me. Should have uploaded v24. Testing has turned up a minor issue... will upload v25 soon. bq. In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37 Nope. This is the global version that introduces compression. No need of major/minor granularity, and in particular major/minor on the compression feature itself. Its overkill. bq. I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version. Nope. First figure if we have a file that does compression. Then figure what type of compression the file does. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228965#comment-13228965 ] stack commented on HBASE-4608: -- bq. It implies that WAL_VERSION is the same as COMPRESSION_VERSION. Yes. Thats right. The current global version is the version that introduces WAL compression. bq. As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION You are conflating wal version and compression type. They are not the same thing. If we introduce a new compression type only, and if all else is equal -- same API, etc. -- then we don't need to up the global version. We are just adding a new compression type. Either we support it or we don't. If we don't we'll throw unsupported compression type (the dictionary compression type is currently called DICTIONARY_COMPRESSION_TYPE). HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228970#comment-13228970 ] Zhihong Yu commented on HBASE-4608: --- bq. and if all else is equal – same API, etc. – then we don't need to up the global version. True. But we don't know if the current dictionary compression API is general enough to cover the new compression type. bq. wal version and compression type. They are not the same thing. Agreed. But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added. Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228971#comment-13228971 ] Zhihong Yu commented on HBASE-4608: --- In HLogKey.java: {code} + * Enables compression. + * + * @param tableDict + * dictionary used for compressing table + * @param regionDict + * dictionary used for compressing region + */ + public void setCompressionContext(CompressionContext compressionContext) { {code} Please adjust the javadoc above. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228972#comment-13228972 ] Hadoop QA commented on HBASE-4608: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518291/4608v25.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//console This message is automatically generated. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228976#comment-13228976 ] stack commented on HBASE-4608: -- bq. But we don't know if the current dictionary compression API is general enough to cover the new compression type. Agree that we don't know what the future will bring. Not going to try. bq. But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added. Yes, thats one possible scenario. There are others where we need to change the version. Can deal when we get there. bq. Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ? If API doesn't change, no need to up the global file version. Could add new improved dictionary compression type. If we need to change the api, then we'll need to change the global version. At the same time we might add some other facility that has nought to do w/ compression -- say, we might decide to intersperse markers for when we flush or compact. We'd likely bump the version one point only though. This new version would say indicate wal was now able to do extended compression api AND includes flush and compaction markers. We could bump the version once per feature added but that buys us nothing; its the version we ship that counts, the accumulation of features since last time we shipped. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228979#comment-13228979 ] stack commented on HBASE-4608: -- I took a random set of 40 logs off our front end and did a cycle of compress, decompress multiple times in a row and verified the compressed version always ends up the same size. No errors. I'm seeing a pretty consistent 44% of original size compression: {code} pynchon:sv4r21s12,60020,1331025586905 stack$ echo 10561645/23441144|bc -l .45056013477840501299 pynchon:sv4r21s12,60020,1331025586905 stack$ echo 28127053/63755003|bc -l .44117405186225150048 pynchon:sv4r21s12,60020,1331025586905 stack$ echo 28309792/63756667|bc -l .44402873192853697951 pynchon:sv4r21s12,60020,1331025586905 stack$ echo 28318314/63754114|bc -l .44418018263103773977 ... {code} HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228981#comment-13228981 ] stack commented on HBASE-4608: -- We need facility in wal like we have in hfile for printing statistics on load carried. Our frontend is loads of counters. I've not verified. Should be random enough in table naming and region though so should be doing a bit of exercise of the compression code. I'm game for committing this as a first cut if I can get a +1. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228983#comment-13228983 ] Zhihong Yu commented on HBASE-4608: --- I feel the dictionary compression implementation is pervasive throughout the patch. e.g.: {code} +boolean compression = reader.isWALCompressionEnabled(); +if (compression) { + try { +if (compressionContext == null) { + compressionContext = new CompressionContext(LRUDictionary.class); {code} while isWALCompressionEnabled() sounds general, LRUDictionary.class is passed to the context. This would make developing a new compression scheme hard. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228985#comment-13228985 ] stack commented on HBASE-4608: -- bq. This would make developing a new compression scheme hard. Out of scope for this issue. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228986#comment-13228986 ] stack commented on HBASE-4608: -- bq. This would make developing a new compression scheme hard. Out of scope for this issue. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228991#comment-13228991 ] Zhihong Yu commented on HBASE-4608: --- bq. Out of scope for this issue. This reminds me of HBASE-4218: from Aug 17th 2011 to Feb 17th 2012, the development took 6 months. This JIRA doesn't have as many algorithms as those in HBASE-4218. But we should follow similar goal: From Jacek @ 17/Aug/11 21:47: bq. Once we have common interface you would be able to reuse some of my tests and benchmarks. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228996#comment-13228996 ] stack commented on HBASE-4608: -- Can I get a +1 from someone else. Its not a big patch. Should be a quick review. Thanks. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227869#comment-13227869 ] stack commented on HBASE-4608: -- Is HLog versioned? If not, perhaps instead of a HConstants.WAL_COMPRESSION_VER, add a WAL_VERSION metadata field. Then have another for compression type (NONE or this)? bq. For TestLRUDictionary, please outline the combinations that should be added. Does it not look bare to you? I'd think that we'd try a paragraph of text going in and out... perhaps test multiple dictionaries in the one file? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227892#comment-13227892 ] Ted Yu commented on HBASE-4608: --- Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future. Is there plan for the above ? Having another compression type is nice but requires making HLogKey persistence pluggable. I think it would be better to introduce one meta entry instead of two. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227898#comment-13227898 ] stack commented on HBASE-4608: -- In TestLRUDictionary, we test a single entry in essence. We should try it w/ all kinds of rubbish... really long entries, empty entries, null entries similar entries... a dictionary for 32k worth of stuff..as we'll do in the wild. So I'd think? A test for the new class KeyValueCompression would be good to have too. enableCompression is an odd name for this method. Should it be setCompressionContext since that is what it does (you pass null if no compression)... seems odd passing null to 'enableCompression' Should the Compression class in wal package have more javadoc comments explaining the kinda of compression it does? Otherwise, it looks like a generic compressor class when in facts its a one-trick pony? Should this method, WALCompressionEnabled, be isWALCompressionEnabled? I like your idea of versioning the WAL Patch is coming along nicely. Almost there. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227901#comment-13227901 ] Ted Yu commented on HBASE-4608: --- bq. try a paragraph of text going in and out LRUDictionary deals with byte array: {code} public short findEntry(byte[] data, int offset, int length) { {code} In this regard, piping text into the dictionary is functionally same as piping byte[] form of integer. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227908#comment-13227908 ] stack commented on HBASE-4608: -- Its the test of a single entry only which is not really exercising much. bq. Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future. Is there plan for the above ? I've not heard of any. Is that your argument for not adding a version? Because if there has been no discussion of change up to this, we wouldn't possibly need to change the format in the future? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227929#comment-13227929 ] stack commented on HBASE-4608: -- Its a regular pattern only. Perhaps this does some decent testing? TestWALReplayCompressed? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227934#comment-13227934 ] stack commented on HBASE-4608: -- The tests do not have variety. I think we should add it here rather than wait for the variety to hit out in the field. bq. If only compression would evolve, I think checking against compression type metadata would be adequate. The above begins with a conditional, If HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227946#comment-13227946 ] Ted Yu commented on HBASE-4608: --- I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature. Say we define WAL_VERSION as v2 which has WAL compression capability. We still need to check compression type metadata before applying dictionary compression. In this regard adding WAL_VERSION seems to be redundant. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227954#comment-13227954 ] Ted Yu commented on HBASE-4608: --- bq. Should the Compression class in wal package ... I only see KeyValueCompression.java under wal package. Please elaborate which class should carry more comments. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227961#comment-13227961 ] Ted Yu commented on HBASE-4608: --- Uploaded v23 onto review board. After WAL version metadata design is finalized, will add that. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227969#comment-13227969 ] stack commented on HBASE-4608: -- bq. I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature. How does it get in if you don't add it? If you don't want to add it, just don't. I'm not going to +1 this patch though if it adds metadata about a new compression feature w/o introducing a general versioning on the WAL. bq. Should the Compression class in wal package ... The compression class in wal is Compressor.java. I have trouble following your responses to my comments because they come in w/o context and are also they are done piecemeal which means I have to spend way more time than I should have to reviewing your stuff. I'd suggest you save up your comments and submit them in a lump rather than hit submit per comment; you'll use up less internet. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227984#comment-13227984 ] Ted Yu commented on HBASE-4608: --- For code specific review, please use https://reviews.apache.org/r/4185/ where there would be context. I can add WAL_VERSION as v2 in the metadata. My question is: would HLog v2 be allowed not to compress Log entries ? If desirable, we can discuss in more detail, face to face, on the 27th. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228033#comment-13228033 ] stack commented on HBASE-4608: -- bq. I can add WAL_VERSION as v2 in the metadata. Why not as version 1? The absence of WAL_VERSION can be version zero. bq. My question is: would HLog v2 be allowed not to compress Log entries ? Yes. The compress flag would be 'off' (isn't that the default?) bq. If desirable, we can discuss in more detail, face to face, on the 27th. Why wait till then? This is the last big one before we can release a 0.94. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228042#comment-13228042 ] Zhihong Yu commented on HBASE-4608: --- Since WAL compression may be off for the new HLog file version, we would always consult compression type metadata when reading HLog file. WAL_VERSION is written but is not needed at time of reading HLog. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228051#comment-13228051 ] Lars Hofhansl commented on HBASE-4608: -- bq. My question is: would HLog v2 be allowed not to compress Log entries ? I think the answer is yes. You're right that VERSION is orthogonal to COMPRESSION. I do agree with Stack that while we're adding metadata to HLog we should add a VERSION as well. We should add both VERSION and COMPRESSION metadata. (Maybe that's what you were saying anyway, if so feel free to ignore me). HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228056#comment-13228056 ] Zhihong Yu commented on HBASE-4608: --- I think we may enhance WAL compression using dictionary in the future. So for DICTIONARY compression type, it is desirable to introduce versioning as well. I don't have strong opinion about WAL_VERSION actually. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228097#comment-13228097 ] Zhihong Yu commented on HBASE-4608: --- HLog version decision aside, my feeling about the current implementation is -0.5 First, compression ratio is not good - at least for the data written by PE. Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228108#comment-13228108 ] Todd Lipcon commented on HBASE-4608: bq. First, compression ratio is not good - at least for the data written by PE. I saw ~40% compression on a YCSB load. So some workloads may have good results whereas others didn't. Did you also re-run the test after fixing the bug? Maybe that skewed the results? bq. Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard. I agree we should use a metadata field in the log to describe which compression mechanism is being used. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228122#comment-13228122 ] Zhihong Yu commented on HBASE-4608: --- We're looking at several metadata fields for version: 1. WAL_VERSION for HLog file 2. compression type for HLog file 3. compression major (minor) version 4. HLogKey version (covered in latest patch) It would create some confusion w.r.t. the different combinations of the above 4 HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228157#comment-13228157 ] stack commented on HBASE-4608: -- Stop making this more complicated than it need be Ted. WAL_VERSION is global version on WAL log. Adding a type metadata field for compression makes sense. If none, presume uncompressed. You don't need a compression type version. If we change the format, we can do PREFIX_COMPRESSION_V2. HLogKeys are serialized independent of their container. Don't conflate their versioning w/ the suggested WAL log versioning. Regards PE data, its data is not amenable to compression. Its keys are very basic. Its likely not a good test evaluating the viability of this feature. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228158#comment-13228158 ] stack commented on HBASE-4608: -- Stop making this more complicated than it need be Ted. WAL_VERSION is global version on WAL log. Adding a type metadata field for compression makes sense. If none, presume uncompressed. You don't need a compression type version. If we change the format, we can do PREFIX_COMPRESSION_V2. HLogKeys are serialized independent of their container. Don't conflate their versioning w/ the suggested WAL log versioning. Regards PE data, its data is not amenable to compression. Its keys are very basic. Its likely not a good test evaluating the viability of this feature. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228164#comment-13228164 ] Zhihong Yu commented on HBASE-4608: --- Having PREFIX_COMPRESSION_V2 in the future is equivalent to having compression type version. It may make compression checking verbose: I think checking against one compression type is better than comparing with every PREFIX_COMPRESSION_Vx. I agree with the observation about PE data. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228166#comment-13228166 ] stack commented on HBASE-4608: -- @Ted I think you should resign ownership of this issue. You are just pushing its conclusion further out w/ your continual negotiation and what ifs. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228165#comment-13228165 ] Zhihong Yu commented on HBASE-4608: --- bq. Stop making this more complicated than it need be Ted. It is rare that I saw review comments in such tone: condescending. And the same comment was posted twice. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228173#comment-13228173 ] Zhihong Yu commented on HBASE-4608: --- From what can one conclude who owns the issue ? Assignee ? I do have an opinion on compression type versioning. I would wait for a concrete design to form. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228175#comment-13228175 ] stack commented on HBASE-4608: -- Let me have a go at it since Li Pi can't finish it just yet. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228181#comment-13228181 ] Lars Hofhansl commented on HBASE-4608: -- Just my $0.02 here... I think having a compression type + compression version will be hard to grok for newcomers unfamiliar with this area, whereas having a single compression type fields is clear. A new version of a compression algorithm is a new type (IMHO). We do not have compression versions for the HFiles, just compression types. I think with WAL_VERSION and compression type we have enough flexibility (HLogKey version is really unrelated as it is for other serialization as well). What do you think Ted? I'll do some testing as to what the compression ratio is for a few of our scenarios tomorrow. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228185#comment-13228185 ] stack commented on HBASE-4608: -- bq. It is rare that I saw review comments in such tone: condescending. Don't be silly. Frustrated, yes. Condescending no. bq. And the same comment was posted twice. Sorry about that. Made a mistake. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228193#comment-13228193 ] Li Pi commented on HBASE-4608: -- Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram. There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this. My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression. Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228194#comment-13228194 ] Li Pi commented on HBASE-4608: -- Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram. There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this. My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression. Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228196#comment-13228196 ] Zhihong Yu commented on HBASE-4608: --- From HFileBlock: {code} int getMinorVersion() { return this.minorVersion; } {code} From HFileReaderV2.java: {code} private void validateMinorVersion(Path path, int minorVersion) { if (minorVersion MIN_MINOR_VERSION || minorVersion MAX_MINOR_VERSION) { {code} I think compression type versioning would allow us to perform migration with ease in the future. PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228202#comment-13228202 ] stack commented on HBASE-4608: -- bq. PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version. Ted, you misunderstood. The above was suggested name for a new compression type, a version two of prefix compression. Your bringing hfile compression versioning in here is an unnecessary complication, IMO. Compression will not have the variety here it does over in hfile (IMO). bq. I think compression type versioning would allow us to perform migration with ease in the future. Not needed. We will have compression types and WAL file global versioning. That should be sufficient describing future evolutions, IMO. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228216#comment-13228216 ] Zhihong Yu commented on HBASE-4608: --- Since Li Pi has done 90% of coding, I think this JIRA should bear his name at the time of integration. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: stack Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira