[ https://issues.apache.org/jira/browse/HBASE-26258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411622#comment-17411622 ]
Andrew Kyle Purtell commented on HBASE-26258: --------------------------------------------- Also, wherever this change is made available, we can switch the very pessimistic and essentially unusable default of GZ codec for WAL value compression (HBASE-25869), chosen because GZ was formerly the only universally available option, to SNAPPY, which is the strongly preferred choice for that feature. > Universal SNAPPY and ZSTD compression support via aircompressor > --------------------------------------------------------------- > > Key: HBASE-26258 > URL: https://issues.apache.org/jira/browse/HBASE-26258 > Project: HBase > Issue Type: Improvement > Components: HFile > Reporter: Andrew Kyle Purtell > Assignee: Andrew Kyle Purtell > Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2 > > > Some Hadoop compression codecs became more available in recent Hadoop 3.x > releases, addressed by HBASE-25940. This is nice but still requires native > platform support, which to state the obvious is not available on all > platforms and architectures, even if native libaries for some are bundled > into jars. > Airlift's aircompressor > (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2 > licensed library, for Java 8 and up, available in Maven central, which > provides pure Java implementations of desirable compression algorithms gzip, > lz4, lzo, snappy, and zstd, and Hadoop compression codecs for same, claiming > "_they are typically 300% faster than the JNI wrappers_." > (https://github.com/airlift/aircompressor). This library is under active > development and has up to date releases because it is used by Trino. > We have another project that depends on universal availability of SNAPPY. I > would like to make this change as a general improvement which also satisfies > that requirement. (The as yet unnamed project will be contributed later.) It > will be a very nice-to-have to have universal ZSTD support available as well. > Proposed changes: > * Modify Compression.java such that compression codec implementation classes > can be specified by configuration. Currently they are hardcoded as strings. > * Pull in aircompressor as a 'compile' time dependency so it will be bundled > into our build and made available on the server classpath. > * Modify Compression.java to fall back to an aircompressor pure Java > implementation if schema specifies a compression algorithm, a Hadoop native > codec was specified as desired implementation, but the requisite native > support is somehow not available. > The combination of these changes will provide universal (pure Java) support > for these desired and desirable compression codecs while retaining default > behavior, which is to load and utilize Hadoop native implementations of same, > if native support is available. They will also let you override this default > if you wish to chase the claimed benefits of the pure Java alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005)