[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253127#comment-13253127 ]
Ted Yu commented on HBASE-5778: ------------------------------- The remaining issue is about how the replication sink correctly decompresses WAL. >From test output, I saw: {code} java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2243) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2249) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:129) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1700) {code} For replication sink, there is no CompressionContext in HLog$Entry which can be used to perform decompression. I agree the change should be reverted. > Turn on WAL compression by default > ---------------------------------- > > Key: HBASE-5778 > URL: https://issues.apache.org/jira/browse/HBASE-5778 > Project: HBase > Issue Type: Improvement > Reporter: Jean-Daniel Cryans > Assignee: Lars Hofhansl > Priority: Blocker > Fix For: 0.94.0, 0.96.0 > > Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch > > > I ran some tests to verify if WAL compression should be turned on by default. > For a use case where it's not very useful (values two order of magnitude > bigger than the keys), the insert time wasn't different and the CPU usage 15% > higher (150% CPU usage VS 130% when not compressing the WAL). > When values are smaller than the keys, I saw a 38% improvement for the insert > run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure > WAL compression accounts for all the additional CPU usage, it might just be > that we're able to insert faster and we spend more time in the MemStore per > second (because our MemStores are bad when they contain tens of thousands of > values). > Those are two extremes, but it shows that for the price of some CPU we can > save a lot. My machines have 2 quads with HT, so I still had a lot of idle > CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira