[GitHub] [hbase] apurtell commented on a change in pull request #3244: HBASE-25869 WAL value compression

GitBox Mon, 10 May 2021 17:01:40 -0700


apurtell commented on a change in pull request #3244:
URL: https://github.com/apache/hbase/pull/3244#discussion_r629757222




##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -241,10 +246,27 @@ public void write(Cell cell) throws IOException {
         compression.getDictionary(CompressionContext.DictionaryIndex.FAMILY));
       PrivateCellUtil.compressQualifier(out, cell,
         
compression.getDictionary(CompressionContext.DictionaryIndex.QUALIFIER));
-      // Write timestamp, type and value as uncompressed.
+      // Write timestamp, type and value.
       StreamUtils.writeLong(out, cell.getTimestamp());
-      out.write(cell.getTypeByte());
-      PrivateCellUtil.writeValue(out, cell, cell.getValueLength());
+      byte type = cell.getTypeByte();
+      if (compression.getValueCompressor() != null &&
+          cell.getValueLength() > VALUE_COMPRESS_THRESHOLD) {
+        // Try compressing the cell's value
+        byte[] compressedBytes = compressValue(cell);
+        // Only write the compressed value if we have achieved some space 
savings.
+        if (compressedBytes.length < cell.getValueLength()) {
+          // Set the high bit of type to indicate the value is compressed
+          out.write((byte)(type|0x80));

Review comment:
       > Jetty settled on a size threshold of 23 bytes.
   
   Thank you @ndimiduk . gzip and deflate are the same thing, essentially. 
Let's opt for the smaller threshold and see how it goes. Worst case if the 
compressor produces output that is larger than the original, we just discard it 
and use the original, so that's not a problem. With a smaller threshold more 
values are eligible for compression so there will be more time spent in 
compression, but presumably with a pay off in space savings, so that seems 
fine. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] apurtell commented on a change in pull request #3244: HBASE-25869 WAL value compression

Reply via email to