[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493453#comment-13493453 ]
stack commented on HBASE-5778: ------------------------------ Adding compression context to the general HLog Interface seems incorrect to me. This kinda of thing will not make sense for all implementations of HLog. We are going against the effort which tries to turn HLog into an Interface with this patch as is. Ditto on ReplicationSource having to know anything about HLog compression, carrying compression context (This seems 'off' having to do this in ReplicationSource --> +import org.apache.hadoop.hbase.regionserver.wal.CompressionContext;). What happens if HLog has a different kind of compression than our current type? All will break? This seems wrong having to do this over in ReplicationSource: {code} + // If we're compressing logs and the oldest recovered log's last position is greater + // than 0, we need to rebuild the dictionary up to that point without replicating + // the edits again. The rebuilding part is simply done by reading the log. {code} Why can't the internal implementation do the skipping if dictionary is empty and we are at an offset > 0? Rather than passing compression context to SequenceFileLogReader, can we not have a CompressedSequenceLogReader and internally it manages compression contexts not letting them outside of CSLR? > Turn on WAL compression by default > ---------------------------------- > > Key: HBASE-5778 > URL: https://issues.apache.org/jira/browse/HBASE-5778 > Project: HBase > Issue Type: Improvement > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Blocker > Fix For: 0.96.0 > > Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, > HBASE-5778-0.94-v2.patch, HBASE-5778.patch > > > I ran some tests to verify if WAL compression should be turned on by default. > For a use case where it's not very useful (values two order of magnitude > bigger than the keys), the insert time wasn't different and the CPU usage 15% > higher (150% CPU usage VS 130% when not compressing the WAL). > When values are smaller than the keys, I saw a 38% improvement for the insert > run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure > WAL compression accounts for all the additional CPU usage, it might just be > that we're able to insert faster and we spend more time in the MemStore per > second (because our MemStores are bad when they contain tens of thousands of > values). > Those are two extremes, but it shows that for the price of some CPU we can > save a lot. My machines have 2 quads with HT, so I still had a lot of idle > CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira