[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-16 Thread Lars Francke (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231009#comment-13231009
 ] 

Lars Francke commented on HBASE-4608:
-

This seems to be missing documentation, no?

Shouldn't the hbase.regionserver.wal.enablecompression key at least be in 
hbase-default.xml?

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230360#comment-13230360
 ] 

Zhihong Yu commented on HBASE-4608:
---

w.r.t. Lars' comment: 
https://issues.apache.org/jira/browse/HBASE-4608?focusedCommentId=13229010page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229010

I think it makes sense.
How about introducing an enum CompressionType with values of NONE and 
DICTIONARY ?
HConstants.ENABLE_WAL_COMPRESSION would be replaced by another String: 
hbase.regionserver.wal.compressiontype
If hbase.regionserver.wal.compressiontype doesn't appear in conf, 
CompressionType.NONE is assumed.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230449#comment-13230449
 ] 

stack commented on HBASE-4608:
--

I'll commit v30 then.

Thanks all for reviews, etc.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230456#comment-13230456
 ] 

stack commented on HBASE-4608:
--

Now this is in, does that mean we can cut a 0.94RC0?

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230490#comment-13230490
 ] 

Lars Hofhansl commented on HBASE-4608:
--

Yeah! And, yes, time for an RC.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230493#comment-13230493
 ] 

Hudson commented on HBASE-4608:
---

Integrated in HBase-0.94 #32 (See 
[https://builds.apache.org/job/HBase-0.94/32/])
HBASE-4608 HLog Compression (Revision 1301167)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230500#comment-13230500
 ] 

Li Pi commented on HBASE-4608:
--

Woohoo! It's in!

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230538#comment-13230538
 ] 

Hudson commented on HBASE-4608:
---

Integrated in HBase-TRUNK #2683 (See 
[https://builds.apache.org/job/HBase-TRUNK/2683/])
HBASE-4608 HLog Compression (Revision 1301165)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-15 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230914#comment-13230914
 ] 

Hudson commented on HBASE-4608:
---

Integrated in HBase-TRUNK-security #139 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/139/])
HBASE-4608 HLog Compression (Revision 1301165)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229022#comment-13229022
 ] 

stack commented on HBASE-4608:
--

@Lars Generalizing the compression done here is out of scope for this issue.  
The patch was not written that way from the get go.  The reviews done up to 
like v22odd made no mention of supporting other compression types.  I'd suggest 
we do it in another issue if and when its wanted.

Let me put v27 up on rb.

bq. I forget, do we also SNAPPY/LZO/GZ compress the HLogs?

We don't do this because these compression algorithms work in blocks of 32k or 
so.  If not tied off probably on the end we could lose up to 32k of edits.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229024#comment-13229024
 ] 

stack commented on HBASE-4608:
--

Lars, I can't upload a patch to someone else's issue.  Made a new rb at 
https://reviews.apache.org/r/4328/

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229027#comment-13229027
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/
---

Review request for hbase.


Summary
---

See issue


This addresses bug hbase-4608.
https://issues.apache.org/jira/browse/hbase-4608


Diffs
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
  src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/4328/diff


Testing
---


Thanks,

Michael



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229032#comment-13229032
 ] 

Lars Hofhansl commented on HBASE-4608:
--

@Stack: fair enough. Let's get this one done. +1 on generalization only when 
needed and in another jira.


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229051#comment-13229051
 ] 

Todd Lipcon commented on HBASE-4608:


btw, +1 on this new patch after you've double-checked with your logs and run it 
through the full suite. Lars, did you want to take a look tomorrow before it's 
committed?

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229054#comment-13229054
 ] 

stack commented on HBASE-4608:
--

I like your changes Todd.  Nice fixup.  Lars, let me post v28 for you up on rb.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229055#comment-13229055
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/
---

(Updated 2012-03-14 07:34:58.002687)


Review request for hbase.


Changes
---

Uploading v28 for lars to take a looksee


Summary
---

See issue


This addresses bug hbase-4608.
https://issues.apache.org/jira/browse/hbase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
  src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/4328/diff


Testing
---


Thanks,

Michael



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229064#comment-13229064
 ] 

stack commented on HBASE-4608:
--

I reran compress, decompress, compress cycle over my 40 odd random WALs from 
prod and seems fine w/ v28.  Sizes look right.  No errors.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229069#comment-13229069
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518303/hbase-4608-v28.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 11 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1184//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1184//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1184//console

This message is automatically generated.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229080#comment-13229080
 ] 

Li Pi commented on HBASE-4608:
--

@Stack nvm, just read upwards. That's inline with the other results by Todd and 
I.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229137#comment-13229137
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5929
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
https://reviews.apache.org/r/4328/#comment12894

Introducing enum is a good idea.
I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something 
similar.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
https://reviews.apache.org/r/4328/#comment12895

How about passing compressionContext and type of field we're reading to 
Compressor.readCompressed() ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
https://reviews.apache.org/r/4328/#comment12891

Hiding LRUDictionary.class is desirable.
Shall we pass this.getMetadata() to CompressionContext ctor where selection 
of compression type is made ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
https://reviews.apache.org/r/4328/#comment12892

We introduced compression type in Metadata, how about allowing user to 
specify compression type using conf ?
Default is dictionary compression.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
https://reviews.apache.org/r/4328/#comment12893

Hiding LRUDictionary.class is desirable.
How about passing conf to CompressionContext ctor ?


- Ted


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, 

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229248#comment-13229248
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, 
line 53
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53
bq.  
bq.   Introducing enum is a good idea.
bq.   I would suggest changing this to COMPRESSED_WITH_DICTIONARY or 
something similar.

HLogKey does not need to know about 'type' of compression.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, 
line 306
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line306
bq.  
bq.   How about passing compressionContext and type of field we're reading 
to Compressor.readCompressed() ?

Generalization is out of scope.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 line 189
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189
bq.  
bq.   Hiding LRUDictionary.class is desirable.
bq.   Shall we pass this.getMetadata() to CompressionContext ctor where 
selection of compression type is made ?

Out of scope.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java,
 line 110
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line110
bq.  
bq.   We introduced compression type in Metadata, how about allowing user 
to specify compression type using conf ?
bq.   Default is dictionary compression.

Customization is out of scope.  How about... should have attendant 
justification.  You can justify generalization of this compression in a new 
jira.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java,
 line 139
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line139
bq.  
bq.   Hiding LRUDictionary.class is desirable.
bq.   How about passing conf to CompressionContext ctor ?

The generalization that would require hiding the type of compression being done 
is out of scope.


This is not a software project that fellas are working on for casual amusement. 
 New facility should be justified by real-world needs.  This feature is 
experimental.  It could help w/ our WAL writes.  It may not.  We need to get a 
basic facility into a release so we can try it.  If it proves its worth, we can 
spend more time down this avenue.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5929
---


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq. 

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229263#comment-13229263
 ] 

Li Pi commented on HBASE-4608:
--

+1 from here.

Agree w/ Stack. Compression can be generalized later. We can just bump up the 
version in that case.

Right now, this works, passes tests, and provides a very substantial 
improvement in certain cases. (See Stack's workload).

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229277#comment-13229277
 ] 

Zhihong Yu commented on HBASE-4608:
---

bq. HLogKey does not need to know about 'type' of compression.
I agree. But see this code:
{code}
+  Compressor.writeCompressed(this.encodedRegionName, 0,
+  this.encodedRegionName.length, out,
+  compressionContext.regionDict);
{code}

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229281#comment-13229281
 ] 

Li Pi commented on HBASE-4608:
--

That code is just writing the output for the regionname, using the regiondict.

I guess if the dictionary behavior were to change, it could be problematic. But 
when we have more than 1 dictionary, we can deal with it then.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229292#comment-13229292
 ] 

Li Pi commented on HBASE-4608:
--

Not that we'd compress a random value well at all anyways.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229291#comment-13229291
 ] 

Li Pi commented on HBASE-4608:
--

Also, figured out why Ted's benchmarks differed from the rest of ours.

PE tool tests with random writes to million rows, each row has a single column 
whose value is 1000 randomly-generated byte.

This is pretty difficult to compress. The number of rows means that rownames 
won't fit in the dictionary, and we don't compress values yet.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229294#comment-13229294
 ] 

Zhihong Yu commented on HBASE-4608:
---

bq. we don't compress values yet.
Looks like we have something to do in V2 :-)

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229420#comment-13229420
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
---

Ship it!


Some comments and nits inside. Some extraneous whitespace (can be fixed at 
commit).


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/4328/#comment12915

Nit: Comment here that the status byte is the higher order byte of the dict 
entry.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/4328/#comment12916

I assume we're entirely sure that a dictionary will never have  2^15 
entries.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/4328/#comment12914

Nit: The naming convention is a bit strange.
This one is called uncompress... whereas the method returning a new byte[] 
is called readCompressed



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
https://reviews.apache.org/r/4328/#comment12917

Have a constructor that takes a compression context too?
It seems like once anything has been written to the HLog this should be 
immutable.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
https://reviews.apache.org/r/4328/#comment12919

COMPRESSED is a bit of a strange name.
I happens to be a version of the WAL that supports compression, but it is 
not necessarily compressed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
https://reviews.apache.org/r/4328/#comment12920

ugly whitespace :)



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
https://reviews.apache.org/r/4328/#comment12921

I think I had that question to Li Pi... How much memory do we expect this 
dictionary to take worst case?
I guess since there is one WAL per region server and it is rolled 
periodically it is not a problem at all.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
https://reviews.apache.org/r/4328/#comment12922

I'll trust you folks that a PriorityQueue would not work here.


- Lars


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: 

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229432#comment-13229432
 ] 

Zhihong Yu commented on HBASE-4608:
---

I just thought we should encapsulate LRUDictionary in CompressionContext:
{code}
+boolean compression = reader.isWALCompressionEnabled();
+if (compression) {
+  try {
+if (compressionContext == null) {
+  compressionContext = new CompressionContext(LRUDictionary.class);
{code}
In my opinion CompressionContext shouldn't just be a holder of multiple 
dictionaries.


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229448#comment-13229448
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
32
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32
bq.  
bq.   I think I had that question to Li Pi... How much memory do we expect 
this dictionary to take worst case?
bq.   I guess since there is one WAL per region server and it is rolled 
periodically it is not a problem at all.

65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are 
some big names) = 32768000 bytes. Or 32 megabytes.

If you want to get silly, even at 1kb entries (wtf are you naming things?), it 
maxes out at 320 megabytes. 


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
---


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229454#comment-13229454
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 108
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108
bq.  
bq.   I assume we're entirely sure that a dictionary will never have  
2^15 entries.

It'll start evicting once it hits its max size, which is currently 2 ^ 15.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
---


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229453#comment-13229453
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
32
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32
bq.  
bq.   I think I had that question to Li Pi... How much memory do we expect 
this dictionary to take worst case?
bq.   I guess since there is one WAL per region server and it is rolled 
periodically it is not a problem at all.
bq.  
bq.  Li Pi wrote:
bq.  65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes 
(these are some big names) = 32768000 bytes. Or 32 megabytes.
bq.  
bq.  If you want to get silly, even at 1kb entries (wtf are you naming 
things?), it maxes out at 320 megabytes.

Actually halve those amounts, 2^15, not 2^16.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
---


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229663#comment-13229663
 ] 

stack commented on HBASE-4608:
--

Li asked me lzma some of my logs from the wild.  I did.  W/ lzma --best, it 
compresses down to 12% of size.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229678#comment-13229678
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, 
line 53
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53
bq.  
bq.   Introducing enum is a good idea.
bq.   I would suggest changing this to COMPRESSED_WITH_DICTIONARY or 
something similar.
bq.  
bq.  Michael Stack wrote:
bq.  HLogKey does not need to know about 'type' of compression.

Adding comments around the versions to give some context on why enums are named 
so.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 line 189
bq.   https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189
bq.  
bq.   Hiding LRUDictionary.class is desirable.
bq.   Shall we pass this.getMetadata() to CompressionContext ctor where 
selection of compression type is made ?
bq.  
bq.  Michael Stack wrote:
bq.  Out of scope.

Yeah, adding a factory to choose between different compression context types 
when we have only one compression type available is out of scope for this issue.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5929
---


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, 
 hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.


[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229771#comment-13229771
 ] 

stack commented on HBASE-4608:
--

Here's some WALs to compared compressed w/ patch v29 vs lzma and then the 
dictionary compressed file itself lzma'd (Todd request).  LZMA'ing the 
dictionary compressed file makes it smaller than the lzma'd original.  lzma'ing 
the compressed file makes it 1/4 size of dictionary compressed file (roughly).  
I didn't get a chance to lzo it

{code}

-rw-r--r--   1 stack  staff  64589199 Mar 13 20:24 
sv4r21s12%3A60020.1331685637452
-rwxrwxrwx   1 stack  staff  28906432 Mar 14 15:34 
sv4r21s12%3A60020.1331685637452.compressed
-rw-r--r--   1 stack  staff   7417213 Mar 14 16:25 
sv4r21s12%3A60020.1331685637452.compressed.lzma
-rw-r--r--   1 stack  staff   8511618 Mar 14 16:24 
sv4r21s12%3A60020.1331685637452.lzma
-rw-r--r--   1 stack  staff  63755620 Mar 13 20:24 
sv4r21s12%3A60020.1331687005652
-rwxrwxrwx   1 stack  staff  28804928 Mar 14 15:34 
sv4r21s12%3A60020.1331687005652.compressed
-rw-r--r--   1 stack  staff   6866107 Mar 14 16:28 
sv4r21s12%3A60020.1331687005652.compressed.lzma
-rw-r--r--   1 stack  staff   8328771 Mar 14 16:27 
sv4r21s12%3A60020.1331687005652.lzma
-rw-r--r--   1 stack  staff  63755688 Mar 13 20:24 
sv4r21s12%3A60020.1331688224458
-rwxrwxrwx   1 stack  staff  27701052 Mar 14 15:34 
sv4r21s12%3A60020.1331688224458.compressed
-rw-r--r--   1 stack  staff   6614637 Mar 14 16:31 
sv4r21s12%3A60020.1331688224458.compressed.lzma
-rw-r--r--   1 stack  staff   8462991 Mar 14 16:31 
sv4r21s12%3A60020.1331688224458.lzma
-rw-r--r--   1 stack  staff  64024836 Mar 13 20:24 
sv4r21s12%3A60020.1331689518188
-rwxrwxrwx   1 stack  staff  28851435 Mar 14 15:34 
sv4r21s12%3A60020.1331689518188.compressed
-rw-r--r--   1 stack  staff   6677112 Mar 14 16:35 
sv4r21s12%3A60020.1331689518188.compressed.lzma
-rw-r--r--   1 stack  staff   8158847 Mar 14 16:34 
sv4r21s12%3A60020.1331689518188.lzma
-rw-r--r--   1 stack  staff  63757131 Mar 13 20:24 
sv4r21s12%3A60020.1331690608900
-rwxrwxrwx   1 stack  staff  28201506 Mar 14 15:34 
sv4r21s12%3A60020.1331690608900.compressed
-rw-r--r--   1 stack  staff   6941982 Mar 14 16:38 
sv4r21s12%3A60020.1331690608900.compressed.lzma
-rw-r--r--   1 stack  staff   8513895 Mar 14 16:37 
sv4r21s12%3A60020.1331690608900.lzma
-rw-r--r--   1 stack  staff  63754114 Mar 13 20:24 
sv4r21s12%3A60020.1331691711502
-rwxrwxrwx   1 stack  staff  28318314 Mar 14 15:34 
sv4r21s12%3A60020.1331691711502.compressed
-rw-r--r--   1 stack  staff   7392701 Mar 14 16:42 
sv4r21s12%3A60020.1331691711502.compressed.lzma
-rw-r--r--   1 stack  staff   9136798 Mar 14 16:41 
sv4r21s12%3A60020.1331691711502.lzma
-rw-r--r--   1 stack  staff  63756667 Mar 13 20:24 
sv4r21s12%3A60020.1331692886725
-rwxrwxrwx   1 stack  staff  28309792 Mar 14 15:34 
sv4r21s12%3A60020.1331692886725.compressed
-rw-r--r--   1 stack  staff   7139965 Mar 14 16:44 
sv4r21s12%3A60020.1331692886725.compressed.lzma
-rw-r--r--   1 stack  staff   8968155 Mar 14 16:43 
sv4r21s12%3A60020.1331692886725.lzma
-rw-r--r--   1 stack  staff  63755003 Mar 13 20:24 
sv4r21s12%3A60020.1331694049033
-rwxrwxrwx   1 stack  staff  28127053 Mar 14 15:35 
sv4r21s12%3A60020.1331694049033.compressed
-rw-r--r--   1 stack  staff   6498486 Mar 14 16:45 
sv4r21s12%3A60020.1331694049033.compressed.lzma
-rw-r--r--   1 stack  staff   8175618 Mar 14 16:45 
sv4r21s12%3A60020.1331694049033.lzma
-rw-r--r--   1 stack  staff  23441144 Mar 13 20:24 
sv4r21s12%3A60020.1331695045194
-rwxrwxrwx   1 stack  staff  10561645 Mar 14 15:35 
sv4r21s12%3A60020.1331695045194.compressed
-rw-r--r--   1 stack  staff   2922204 Mar 14 16:46 
sv4r21s12%3A60020.1331695045194.compressed.lzma
-rw-r--r--   1 stack  staff   3228837 Mar 14 16:46 
sv4r21s12%3A60020.1331695045194.lzma
{code}

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by 

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229773#comment-13229773
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5972
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
https://reviews.apache.org/r/4328/#comment12946

IllegalArgumentException is not needed here.
I removed it, compiled and ran TestCompressor - it passed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/4328/#comment12947

A closing ) should be placed either on this line or on line 109.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/4328/#comment12948

Should read 'byte of index to the ...'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/4328/#comment12949

Should read 'an array of bytes'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/4328/#comment12950

Please add javadoc for offset and length.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
https://reviews.apache.org/r/4328/#comment12958

Should we label this class @InterfaceAudience.Private ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
https://reviews.apache.org/r/4328/#comment12951

I don't quite get what the second sentence is supposed to convey ?
It seems to be same as first sentence.

This version is the minimum version that supports compression.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
https://reviews.apache.org/r/4328/#comment12952

A (slightly) long line.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
https://reviews.apache.org/r/4328/#comment12954

Long line.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
https://reviews.apache.org/r/4328/#comment12955

Can we remove 'silly' here ?
Some user may actually reach this size.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
https://reviews.apache.org/r/4328/#comment12956

'initiate' is used to start an action or message.
'initialize' should be used here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
https://reviews.apache.org/r/4328/#comment12957

Setting reader to null would be desirable after the close() call.


- Ted


On 2012-03-14 22:26:34, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 22:26:34)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229776#comment-13229776
 ] 

Lars Hofhansl commented on HBASE-4608:
--

I'm still +1 :)

The lzma number are interesting. Maybe a nice (future) solution would be to 
dictionary compress the HLog while writing, and then when the log is rolled 
compress it with lzma (since we know the file won't change any more we can 
compress it wholesale).
This begs the next question: What portion of the WAL storage do the current 
WALs represent?


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229826#comment-13229826
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518388/4608v29.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 11 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1189//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1189//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1189//console

This message is automatically generated.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229871#comment-13229871
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 164
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line164
bq.  
bq.   Please add javadoc for offset and length.

Are you joking?  On a protected method with parameter names such as these that 
follow a byte array argument? 


bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, 
line 36
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92427#file92427line36
bq.  
bq.   IllegalArgumentException is not needed here.
bq.   I removed it, compiled and ran TestCompressor - it passed.

Removed.


bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 108
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line108
bq.  
bq.   A closing ) should be placed either on this line or on line 109.

done


bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 143
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line143
bq.  
bq.   Should read 'byte of index to the ...'

done


bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, 
line 55
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92431#file92431line55
bq.  
bq.   I don't quite get what the second sentence is supposed to convey ?
bq.   It seems to be same as first sentence.
bq.   
bq.   This version is the minimum version that supports compression.

Leaving as is.  The second sentence is to emphasize that only the dictionary 
compression was introduced in version -2.


bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
33
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92433#file92433line33
bq.  
bq.   Can we remove 'silly' here ?
bq.   Some user may actually reach this size.

Then they are being silly.


bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 line 202
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92434#file92434line202
bq.  
bq.   Setting reader to null would be desirable after the close() call.

Done


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5972
---


On 2012-03-14 22:26:34, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 22:26:34)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229877#comment-13229877
 ] 

stack commented on HBASE-4608:
--

@Ted Would suggest that in future you not piecemeal in your reviews.  Bulk them 
up.  When review comes in in dribs and drabs, the whole process takes way 
longer.

@Lars What portion of the WAL storage do the current WALs represent?

Do you mean, how much of our footprint is comprised of WAL logs?  Not sure.  I 
thought intent of this issue was to speed syncs because there'd be less bytes 
to shuttle across the datanode replicas pipeline.

I'm not wondering if this patch is worth adding?  If compressible stuff is only 
shrinking by half, is that big enough win?  What do you lot thing?  LZMA is not 
viable because it takes for ever compressing though its turning SU WALs into 
11-14% original size.

Let me try adding lzo numbers but we wouldn't want to use lzo anyways because 
we could lose a bunch of edits off the end if the compression block was not 
closed off (Thats my understanding.  I could be wrong).

Li, what happens if we cut the end off a dictionary-compressed file.  Will we 
be able to read up to the last byte or word or so?

Good stuff.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229878#comment-13229878
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java, 
line 28
bq.   https://reviews.apache.org/r/4328/diff/3/?file=92429#file92429line28
bq.  
bq.   Should we label this class @InterfaceAudience.Private ?

Unless a class is public, it doesn't need an interface audience annotation


- Todd


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5972
---


On 2012-03-14 22:26:34, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  ---
bq.  
bq.  (Updated 2012-03-14 22:26:34)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.  https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
311ea1b 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 ff63a5f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 01ebb5c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
d8f317c 
bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
a11899c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229884#comment-13229884
 ] 

Zhihong Yu commented on HBASE-4608:
---

w.r.t. adding javadoc for offset and length of writeCompressed(), I searched 
our code base for '@param offset ' and found 48 occurrences.

I like this snippet from HFileReaderV2.java:
{code}
 * @param key key byte array
 * @param offset key offset in the key byte array
 * @param length key length
{code}
Even an empty javadoc is better than missing parameter:
{code}
   * @param offset
{code}

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229890#comment-13229890
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518416/4608v30.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 11 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestMetaScanner

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1191//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1191//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1191//console

This message is automatically generated.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229893#comment-13229893
 ] 

Lars Hofhansl commented on HBASE-4608:
--

bq. I'm not wondering if this patch is worth adding? If compressible stuff is 
only shrinking by half, is that big enough win? What do you lot thing? LZMA is 
not viable because it takes for ever compressing though its turning SU WALs 
into 11-14% original size.

You mean you are *now* wondering? :) IMHO: The WAL is probably the greatest 
source of synchronous IO that we generate, cutting this in half seems quite 
valuable (maybe this will be less valuable in the future if/when HDFS can do 
parallel replication instead of chaining - but it is now).
I agree that none of the block based compression schemes would be good 
options... Was merely curious about HLog archiving, which is quite unrelated to 
this issue.

+1, let's commit this.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229913#comment-13229913
 ] 

Li Pi commented on HBASE-4608:
--

If a dictionary file gets cut up, you'll be able to read all the way to the end.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229914#comment-13229914
 ] 

Li Pi commented on HBASE-4608:
--

If a dictionary file gets cut up, you'll be able to read all the way to the end.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-14 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229921#comment-13229921
 ] 

Li Pi commented on HBASE-4608:
--

On other compression things. I looked into those.

plugging into LZMA was the first thing I thought about doing - performance 
stops this one though.

There are other optimization we can make, such as modifying the dictionary to 
take into account frequency, and assigning the highest probability entries to 
the lowest numbers, then using vints rather than 2 bytes for everything. Note 
that we shouldn't be able to beat LZMA, because we neither compress values, nor 
do we compress the SequenceFile overhead. On some workloads, those overheads 
might be substantial - although I haven't checked.

This is actually pretty close to the challenge displayed by caching, in that we 
want to keep the most likely to be repeated entries in our dictionary, and 
evict the rest. I used LRU because LRU was simple, and like caching, pretty 
much anything results in a substantial performance increase over nothing. 

I'm pretty happy with cutting the WAL size in half on optimal workloads, though 
as always, it's nice to work towards future performance goals. I have other 
ideas, but they involve changing the HLog substantially in order to be more 
compact. In that case, we might end up abandoning the Hadoop Sequencefile 
format altogether, and this thing becomes a bit more complex.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228842#comment-13228842
 ] 

Zhihong Yu commented on HBASE-4608:
---

In isWALCompressionEnabled():
{code}
+if (txt == null || Integer.parseInt(txt.toString())  VERSION) return 
false;
{code}
What would happen when we have a newer version for WAL_VERSION_KEY ?

Looks like the following check should suffice for isWALCompressionEnabled():
{code}
+txt = metadata.get(WAL_COMPRESSION_TYPE_KEY);
+return txt != null  txt.equals(DICTIONARY_COMPRESSION_TYPE);
{code}

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228931#comment-13228931
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518270/4608v23.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1180//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1180//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1180//console

This message is automatically generated.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228946#comment-13228946
 ] 

stack commented on HBASE-4608:
--

Here's my compressing, decompressing, compressing again, decompressing again, 
then recompressing a random log file from our front end:

{code}
-rw-r--r--1 stack  staff   64928728 Mar 13 20:43 
sv4r25s8%3A60020.1331661889339
-rwxrwxrwx1 stack  staff   28540761 Mar 13 20:48 
sv4r25s8%3A60020.1331661889339.compressed
-rwxrwxrwx1 stack  staff   28540761 Mar 13 20:58 
sv4r25s8%3A60020.1331661889339.compressed.again
-rwxrwxrwx1 stack  staff   28540761 Mar 13 21:02 
sv4r25s8%3A60020.1331661889339.compressed.again.again
-rwxrwxrwx1 stack  staff   64945799 Mar 13 20:57 
sv4r25s8%3A60020.1331661889339.decompressed
-rwxrwxrwx1 stack  staff   64945799 Mar 13 21:02 
sv4r25s8%3A60020.1331661889339.decompressed.again
{code}

Its 44% of original size.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228947#comment-13228947
 ] 

stack commented on HBASE-4608:
--

Here's my compressing, decompressing, compressing again, decompressing again, 
then recompressing a random log file from our front end:

{code}
-rw-r--r--1 stack  staff   64928728 Mar 13 20:43 
sv4r25s8%3A60020.1331661889339
-rwxrwxrwx1 stack  staff   28540761 Mar 13 20:48 
sv4r25s8%3A60020.1331661889339.compressed
-rwxrwxrwx1 stack  staff   28540761 Mar 13 20:58 
sv4r25s8%3A60020.1331661889339.compressed.again
-rwxrwxrwx1 stack  staff   28540761 Mar 13 21:02 
sv4r25s8%3A60020.1331661889339.compressed.again.again
-rwxrwxrwx1 stack  staff   64945799 Mar 13 20:57 
sv4r25s8%3A60020.1331661889339.decompressed
-rwxrwxrwx1 stack  staff   64945799 Mar 13 21:02 
sv4r25s8%3A60020.1331661889339.decompressed.again
{code}

Its 44% of original size.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228948#comment-13228948
 ] 

Zhihong Yu commented on HBASE-4608:
---

The sentence involving COMPRESSION_VERSION was in past tense but I don't see it 
in patch v23.

Let me elaborate more on my comment @ 14/Mar/12 00:26
As you described, we would use a new constant (COMPRESSION_VERSION) to 
represent the minimum version that supports dictionary compression.
In my opinion, this version corresponds to the major version in my comment @ 
13/Mar/12 01:37

Say we later introduce prefix compression, we would introduce another constant 
representing the minimum version supporting prefix compression.

I agree that both version and compression type should be checked. However, the 
order should be checking compression type followed by checking version.

Regards

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228950#comment-13228950
 ] 

Zhihong Yu commented on HBASE-4608:
---

I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different 
from that of sv4r25s8%3A60020.1331661889339

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228954#comment-13228954
 ] 

Zhihong Yu commented on HBASE-4608:
---

Please wrap long line:
{code}
+  public static final String ENABLE_WAL_COMPRESSION = 
hbase.regionserver.wal.enablecompression;
{code}
w.r.t. the following code:
{code}
+  static final int VERSION = COMPRESSION_VERSION;
+  static final Text WAL_VERSION = new Text( + VERSION);
{code}
It implies that WAL_VERSION is the same as COMPRESSION_VERSION.
As I explained earlier, we would likely have another compression scheme for WAL 
in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION e.g.
Then we face a choice: what value would WAL_VERSION carry ?

I propose naming COMPRESSION_VERSION above DICTIONARY_COMPRESSION_VERSION and 
decouple it from WAL_VERSION.
In the future, WAL_VERSION of 2 can carry either dictionary or prefix 
compression.


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228957#comment-13228957
 ] 

stack commented on HBASE-4608:
--

bq. I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is 
different from that of sv4r25s8%3A60020.1331661889339

Because it has metadata the original doesn't have.  When I compress it, it 
compresses down to same size.  Notice that the decompressed and 
decompressed.again are same size because they both have the new meata data.

bq. The sentence involving COMPRESSION_VERSION was in past tense but I don't 
see it in patch v23.

Pardon me.  Should have uploaded v24.

Testing has turned up a minor issue... will upload v25 soon.

bq. In my opinion, this version corresponds to the major version in my comment 
@ 13/Mar/12 01:37

Nope.  This is the global version that introduces compression.  No need of 
major/minor granularity, and in particular major/minor on the compression 
feature itself.  Its overkill.

bq. I agree that both version and compression type should be checked. However, 
the order should be checking compression type followed by checking version.

Nope.  First figure if we have a file that does compression.  Then figure what 
type of compression the file does.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228965#comment-13228965
 ] 

stack commented on HBASE-4608:
--

bq. It implies that WAL_VERSION is the same as COMPRESSION_VERSION.

Yes.  Thats right.  The current global version is the version that introduces 
WAL compression.

bq. As I explained earlier, we would likely have another compression scheme for 
WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION

You are conflating wal version and compression type.  They are not the same 
thing.

If we introduce a new compression type only, and if all else is equal -- same 
API, etc. -- then we don't need to up the global version.  We are just adding a 
new compression type.  Either we support it or we don't.  If we don't we'll 
throw unsupported compression type (the dictionary compression type is 
currently called DICTIONARY_COMPRESSION_TYPE).

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228970#comment-13228970
 ] 

Zhihong Yu commented on HBASE-4608:
---

bq. and if all else is equal – same API, etc. – then we don't need to up the 
global version.
True. But we don't know if the current dictionary compression API is general 
enough to cover the new compression type.

bq. wal version and compression type. They are not the same thing.
Agreed. But the last paragraph above hinges on the scenario of keeping the same 
WAL version when new compression type is added.

Suppose we find a way to improve dictionary compression after the integration 
of this JIRA. Would WAL version increase or stay at 1 ?


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228971#comment-13228971
 ] 

Zhihong Yu commented on HBASE-4608:
---

In HLogKey.java:
{code}
+   * Enables compression.
+   * 
+   * @param tableDict
+   *  dictionary used for compressing table
+   * @param regionDict
+   *  dictionary used for compressing region
+   */
+  public void setCompressionContext(CompressionContext compressionContext) {
{code}
Please adjust the javadoc above.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228972#comment-13228972
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518291/4608v25.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1182//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1182//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1182//console

This message is automatically generated.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228976#comment-13228976
 ] 

stack commented on HBASE-4608:
--

bq. But we don't know if the current dictionary compression API is general 
enough to cover the new compression type.

Agree that we don't know what the future will bring.  Not going to try.

bq. But the last paragraph above hinges on the scenario of keeping the same WAL 
version when new compression type is added.

Yes, thats one possible scenario.  There are others where we need to change the 
version.  Can deal when we get there.

bq. Suppose we find a way to improve dictionary compression after the 
integration of this JIRA. Would WAL version increase or stay at 1 ?

If API doesn't change, no need to up the global file version.  Could add new 
improved dictionary compression type.

If we need to change the api, then we'll need to change the global version.  At 
the same time we might add some other facility that has nought to do w/ 
compression -- say, we might decide to intersperse markers for when we flush or 
compact.  We'd likely bump the version one point only though.  This new version 
would say indicate wal was now able to do extended compression api AND includes 
flush and compaction markers.  We could bump the version once per feature added 
but that buys us nothing; its the version we ship that counts, the accumulation 
of features since last time we shipped.


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228979#comment-13228979
 ] 

stack commented on HBASE-4608:
--

I took a random set of 40 logs off our front end and did a cycle of compress, 
decompress multiple times in a row and verified the compressed version always 
ends up the same size.  No errors.

I'm seeing a pretty consistent 44% of original size compression:

{code}
pynchon:sv4r21s12,60020,1331025586905 stack$ echo 10561645/23441144|bc -l
.45056013477840501299
pynchon:sv4r21s12,60020,1331025586905 stack$ echo 28127053/63755003|bc -l
.44117405186225150048
pynchon:sv4r21s12,60020,1331025586905 stack$ echo 28309792/63756667|bc -l
.44402873192853697951
pynchon:sv4r21s12,60020,1331025586905 stack$ echo 28318314/63754114|bc -l
.44418018263103773977
...
{code}

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228981#comment-13228981
 ] 

stack commented on HBASE-4608:
--

We need facility in wal like we have in hfile for printing statistics on load 
carried. Our frontend is loads of counters.  I've not verified.  Should be 
random enough in table naming and region though so should be doing a bit of 
exercise of the compression code.

I'm game for committing this as a first cut if I can get a +1.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228983#comment-13228983
 ] 

Zhihong Yu commented on HBASE-4608:
---

I feel the dictionary compression implementation is pervasive throughout the 
patch.
e.g.:
{code}
+boolean compression = reader.isWALCompressionEnabled();
+if (compression) {
+  try {
+if (compressionContext == null) {
+  compressionContext = new CompressionContext(LRUDictionary.class);
{code}
while isWALCompressionEnabled() sounds general, LRUDictionary.class is passed 
to the context.
This would make developing a new compression scheme hard.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 
 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228985#comment-13228985
 ] 

stack commented on HBASE-4608:
--

bq. This would make developing a new compression scheme hard.

Out of scope for this issue.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228986#comment-13228986
 ] 

stack commented on HBASE-4608:
--

bq. This would make developing a new compression scheme hard.

Out of scope for this issue.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228991#comment-13228991
 ] 

Zhihong Yu commented on HBASE-4608:
---

bq. Out of scope for this issue.
This reminds me of HBASE-4218: from Aug 17th 2011 to Feb 17th 2012, the 
development took 6 months.

This JIRA doesn't have as many algorithms as those in HBASE-4218. But we should 
follow similar goal:
From Jacek @ 17/Aug/11 21:47:
bq. Once we have common interface you would be able to reuse some of my tests 
and benchmarks.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228996#comment-13228996
 ] 

stack commented on HBASE-4608:
--

Can I get a +1 from someone else.  Its not a big patch.  Should be a quick 
review.  Thanks.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227869#comment-13227869
 ] 

stack commented on HBASE-4608:
--

Is HLog versioned? If not, perhaps instead of a HConstants.WAL_COMPRESSION_VER, 
add a WAL_VERSION metadata field.  Then have another for compression type (NONE 
or this)?

bq. For TestLRUDictionary, please outline the combinations that should be added.

Does it not look bare to you?   I'd think that we'd try a paragraph of text 
going in and out... perhaps test multiple dictionaries in the one file?

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227892#comment-13227892
 ] 

Ted Yu commented on HBASE-4608:
---

Introducing WAL_VERSION would imply that we may change HLog aspect other than 
compression in the future.
Is there plan for the above ?
Having another compression type is nice but requires making HLogKey persistence 
pluggable.

I think it would be better to introduce one meta entry instead of two.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227898#comment-13227898
 ] 

stack commented on HBASE-4608:
--

In TestLRUDictionary, we test a single entry in essence.  We should try it w/ 
all kinds of rubbish... really long entries, empty entries, null entries 
similar entries... a dictionary for 32k worth of stuff..as we'll do in the 
wild.  So I'd think?

A test for the new class KeyValueCompression would be good to have too.


enableCompression is an odd name for this method.  Should it be 
setCompressionContext since that is what it does (you pass null if no 
compression)... seems odd passing null to 'enableCompression'

Should the Compression class in wal package have more javadoc comments 
explaining the kinda of compression it does?  Otherwise, it looks like a 
generic compressor class when in facts its a one-trick pony?

Should this method, WALCompressionEnabled, be isWALCompressionEnabled?

I like your idea of versioning the WAL

Patch is coming along nicely.  Almost there.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227901#comment-13227901
 ] 

Ted Yu commented on HBASE-4608:
---

bq. try a paragraph of text going in and out
LRUDictionary deals with byte array:
{code}
  public short findEntry(byte[] data, int offset, int length) {
{code}
In this regard, piping text into the dictionary is functionally same as piping 
byte[] form of integer.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227908#comment-13227908
 ] 

stack commented on HBASE-4608:
--

Its the test of a single entry only which is not really exercising much.

bq. Introducing WAL_VERSION would imply that we may change HLog aspect other 
than compression in the future. Is there plan for the above ?

I've not heard of any.  Is that your argument for not adding a version?  
Because if there has been no discussion of change up to this, we wouldn't 
possibly need to change the format in the future?



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227929#comment-13227929
 ] 

stack commented on HBASE-4608:
--

Its a regular pattern only.  Perhaps this does some decent testing?  
TestWALReplayCompressed?

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227934#comment-13227934
 ] 

stack commented on HBASE-4608:
--

The tests do not have variety.  I think we should add it here rather than wait 
for the variety to hit out in the field.

bq. If only compression would evolve, I think checking against compression type 
metadata would be adequate.

The above begins with a conditional, If 


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227946#comment-13227946
 ] 

Ted Yu commented on HBASE-4608:
---

I think WAL_VERSION metadata is orthogonal to compression type metadata and I 
would expect both to be present in new HLog files written with this feature.
Say we define WAL_VERSION as v2 which has WAL compression capability. We still 
need to check compression type metadata before applying dictionary compression.
In this regard adding WAL_VERSION seems to be redundant.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227954#comment-13227954
 ] 

Ted Yu commented on HBASE-4608:
---

bq. Should the Compression class in wal package ...
I only see KeyValueCompression.java under wal package. Please elaborate which 
class should carry more comments.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227961#comment-13227961
 ] 

Ted Yu commented on HBASE-4608:
---

Uploaded v23 onto review board.
After WAL version metadata design is finalized, will add that.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227969#comment-13227969
 ] 

stack commented on HBASE-4608:
--

bq. I think WAL_VERSION metadata is orthogonal to compression type metadata and 
I would expect both to be present in new HLog files written with this feature.

How does it get in if you don't add it?

If you don't want to add it, just don't.  I'm not going to +1 this patch though 
if it adds metadata about a new compression feature w/o introducing a general 
versioning on the WAL.

bq. Should the Compression class in wal package ...

The compression class in wal is Compressor.java.

I have trouble following your responses to my comments because they come in w/o 
context and are also they are done piecemeal which means I have to spend way 
more time than I should have to reviewing your stuff.  I'd suggest you save up 
your comments and submit them in a lump rather than hit submit per comment; 
you'll use up less internet.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227984#comment-13227984
 ] 

Ted Yu commented on HBASE-4608:
---

For code specific review, please use https://reviews.apache.org/r/4185/ where 
there would be context.

I can add WAL_VERSION as v2 in the metadata.
My question is: would HLog v2 be allowed not to compress Log entries ?

If desirable, we can discuss in more detail, face to face, on the 27th.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228033#comment-13228033
 ] 

stack commented on HBASE-4608:
--

bq. I can add WAL_VERSION as v2 in the metadata.

Why not as version 1?  The absence of WAL_VERSION can be version zero.

bq. My question is: would HLog v2 be allowed not to compress Log entries ?

Yes.  The compress flag would be 'off' (isn't that the default?)

bq. If desirable, we can discuss in more detail, face to face, on the 27th.

Why wait till then?   This is the last big one before we can release a 0.94.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228042#comment-13228042
 ] 

Zhihong Yu commented on HBASE-4608:
---

Since WAL compression may be off for the new HLog file version, we would always 
consult compression type metadata when reading HLog file.
WAL_VERSION is written but is not needed at time of reading HLog.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228051#comment-13228051
 ] 

Lars Hofhansl commented on HBASE-4608:
--

bq. My question is: would HLog v2 be allowed not to compress Log entries ?

I think the answer is yes. You're right that VERSION is orthogonal to 
COMPRESSION. I do agree with Stack that while we're adding metadata to HLog we 
should add a VERSION as well. We should add both VERSION and COMPRESSION 
metadata. (Maybe that's what you were saying anyway, if so feel free to ignore 
me).


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228056#comment-13228056
 ] 

Zhihong Yu commented on HBASE-4608:
---

I think we may enhance WAL compression using dictionary in the future.
So for DICTIONARY compression type, it is desirable to introduce versioning as 
well.

I don't have strong opinion about WAL_VERSION actually.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228097#comment-13228097
 ] 

Zhihong Yu commented on HBASE-4608:
---

HLog version decision aside, my feeling about the current implementation is -0.5

First, compression ratio is not good - at least for the data written by PE. 

Second, HLogKey persistence becomes dependent on the compression 
implementation. This would make plugging other compression techniques hard. 

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228108#comment-13228108
 ] 

Todd Lipcon commented on HBASE-4608:


bq. First, compression ratio is not good - at least for the data written by PE.
I saw ~40% compression on a YCSB load. So some workloads may have good results 
whereas others didn't. Did you also re-run the test after fixing the bug? Maybe 
that skewed the results?

bq. Second, HLogKey persistence becomes dependent on the compression 
implementation. This would make plugging other compression techniques hard.
I agree we should use a metadata field in the log to describe which compression 
mechanism is being used.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228122#comment-13228122
 ] 

Zhihong Yu commented on HBASE-4608:
---

We're looking at several metadata fields for version:
1. WAL_VERSION for HLog file
2. compression type for HLog file
3. compression major (minor) version
4. HLogKey version (covered in latest patch)

It would create some confusion w.r.t. the different combinations of the above 4

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228157#comment-13228157
 ] 

stack commented on HBASE-4608:
--

Stop making this more complicated than it need be Ted.

WAL_VERSION is global version on WAL log.

Adding a type metadata field for compression makes sense.  If none, presume 
uncompressed.

You don't need a compression type version.  If we change the format, we can do 
PREFIX_COMPRESSION_V2.

HLogKeys are serialized independent of their container.  Don't conflate their 
versioning w/ the suggested WAL log versioning.

Regards PE data, its data is not amenable to compression. Its keys are very 
basic.   Its likely not a good test evaluating the viability of this feature.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228158#comment-13228158
 ] 

stack commented on HBASE-4608:
--

Stop making this more complicated than it need be Ted.

WAL_VERSION is global version on WAL log.

Adding a type metadata field for compression makes sense.  If none, presume 
uncompressed.

You don't need a compression type version.  If we change the format, we can do 
PREFIX_COMPRESSION_V2.

HLogKeys are serialized independent of their container.  Don't conflate their 
versioning w/ the suggested WAL log versioning.

Regards PE data, its data is not amenable to compression. Its keys are very 
basic.   Its likely not a good test evaluating the viability of this feature.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228164#comment-13228164
 ] 

Zhihong Yu commented on HBASE-4608:
---

Having PREFIX_COMPRESSION_V2 in the future is equivalent to having compression 
type version.
It may make compression checking verbose: I think checking against one 
compression type is better than comparing with every PREFIX_COMPRESSION_Vx.

I agree with the observation about PE data.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228166#comment-13228166
 ] 

stack commented on HBASE-4608:
--

@Ted I think you should resign ownership of this issue.  You are just pushing 
its conclusion further out w/ your continual negotiation and what ifs.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228165#comment-13228165
 ] 

Zhihong Yu commented on HBASE-4608:
---

bq. Stop making this more complicated than it need be Ted.
It is rare that I saw review comments in such tone: condescending.

And the same comment was posted twice.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228173#comment-13228173
 ] 

Zhihong Yu commented on HBASE-4608:
---

From what can one conclude who owns the issue ? Assignee ?

I do have an opinion on compression type versioning. I would wait for a 
concrete design to form.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228175#comment-13228175
 ] 

stack commented on HBASE-4608:
--

Let me have a go at it since Li Pi can't finish it just yet.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228181#comment-13228181
 ] 

Lars Hofhansl commented on HBASE-4608:
--

Just my $0.02 here... I think having a compression type + compression version 
will be hard to grok for newcomers unfamiliar with this area, whereas having a 
single compression type fields is clear. A new version of a compression 
algorithm is a new type (IMHO). We do not have compression versions for the 
HFiles, just compression types.

I think with WAL_VERSION and compression type we have enough flexibility 
(HLogKey version is really unrelated as it is for other serialization as well).

What do you think Ted?

I'll do some testing as to what the compression ratio is for a few of our 
scenarios tomorrow.


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228185#comment-13228185
 ] 

stack commented on HBASE-4608:
--

bq. It is rare that I saw review comments in such tone: condescending.

Don't be silly.  Frustrated, yes.  Condescending no.

bq. And the same comment was posted twice.

Sorry about that.  Made a mistake.


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228193#comment-13228193
 ] 

Li Pi commented on HBASE-4608:
--

Yo, sorry I can't quite work on this. Finals are finished this week, and once 
that happens, I'll be able to scram.

There doesn't seem to that much left - though I said that about 3 months ago. 
My bad! Feel free to do as you please, theres not much left on this, and I'm 
happy that work is getting done. I won't be offended at all if somebody else 
wants to take their hand at finishing this.

My thoughts on it were this. WAL_VERSION is used to indicate compression type. 
This is pretty good, because enabling compression would immediately tell older 
versions that the version was wrong, while newer versions with compression 
disabled could function alongside older versions without support for 
compression. 

Also, I had my old benchmarks, and I was getting anywhere from a 20% increase 
to 40% increase on YCSB loads, depending on the testcase. This seemed pretty 
impressive to me. Not sure if a bug was introduced. I'll run a few more 
benchmarks later.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228194#comment-13228194
 ] 

Li Pi commented on HBASE-4608:
--

Yo, sorry I can't quite work on this. Finals are finished this week, and once 
that happens, I'll be able to scram.

There doesn't seem to that much left - though I said that about 3 months ago. 
My bad! Feel free to do as you please, theres not much left on this, and I'm 
happy that work is getting done. I won't be offended at all if somebody else 
wants to take their hand at finishing this.

My thoughts on it were this. WAL_VERSION is used to indicate compression type. 
This is pretty good, because enabling compression would immediately tell older 
versions that the version was wrong, while newer versions with compression 
disabled could function alongside older versions without support for 
compression. 

Also, I had my old benchmarks, and I was getting anywhere from a 20% increase 
to 40% increase on YCSB loads, depending on the testcase. This seemed pretty 
impressive to me. Not sure if a bug was introduced. I'll run a few more 
benchmarks later.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228196#comment-13228196
 ] 

Zhihong Yu commented on HBASE-4608:
---

From HFileBlock:
{code}
  int getMinorVersion() {
return this.minorVersion;
  }
{code}
From HFileReaderV2.java:
{code}
  private void validateMinorVersion(Path path, int minorVersion) {
if (minorVersion  MIN_MINOR_VERSION ||
minorVersion  MAX_MINOR_VERSION) {
{code}
I think compression type versioning would allow us to perform migration with 
ease in the future.

PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression 
type + compression version.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228202#comment-13228202
 ] 

stack commented on HBASE-4608:
--

bq. PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of 
compression type + compression version.

Ted, you misunderstood.  The above was suggested name for a new compression 
type, a version two of prefix compression.

Your bringing hfile compression versioning in here is an unnecessary 
complication, IMO.  Compression will not have the variety here it does over in 
hfile (IMO).

bq. I think compression type versioning would allow us to perform migration 
with ease in the future.

Not needed.  We will have compression types and WAL file global versioning.  
That should be sufficient describing future evolutions, IMO.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228216#comment-13228216
 ] 

Zhihong Yu commented on HBASE-4608:
---

Since Li Pi has done 90% of coding, I think this JIRA should bear his name at 
the time of integration.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   >