apurtell commented on pull request #3244: URL: https://github.com/apache/hbase/pull/3244#issuecomment-837063440
There is a potential performance improvement here. We could create a Deflater/Inflater pair per column family. The universe of column families across all schema in a production setting will not be too large. Then in effect we build a dictionary for each column family, accounting for data distribution differences among the families, which is likely to boost compression results. In addition we can opt for BEST_SPEED under these circumstances for less overall performance impact. Let me explore this idea and come back here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org