Andrew Kyle Purtell created HBASE-26258:
-------------------------------------------
Summary: Universal SNAPPY and ZSTD compression support via
aircompressor
Key: HBASE-26258
URL: https://issues.apache.org/jira/browse/HBASE-26258
Project: HBase
Issue Type: Improvement
Components: HFile
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
Fix For: 2.5.0, 3.0.0-alpha-2
Some Hadoop compression codecs became more available in recent Hadoop 3.x
releases, addressed by HBASE-25940. This is nice but still requires native
platform support, which to state the obvious is not available on all platforms
and architectures, even if native libaries for some are bundled into jars.
Airlift's aircompressor
(https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2
licensed library, for Java 8 and up, available in Maven central, which provides
both pure Java implementations of gzip, lz4, lzo, snappy, and zstd and Hadoop
compression codecs for same, claiming "_they are typically 300% faster than the
JNI wrappers_." (https://github.com/airlift/aircompressor). This library is
under active development and up to date releases because it is used by Trino.
We have another project that depends on universal availability of SNAPPY. I
would like to make this change as a general improvement which also satisfies
that requirement. (The as yet unnamed project will be contributed later.) It
will be a very nice-to-have to have universal ZSTD support available as well.
Proposed changes:
* Modify Compression.java such that compression codec implementation classes
can be specified by configuration. Currently they are hardcoded as strings.
* Pull in aircompressor as a 'compile' time dependency so it will be bundled
into our build and made available on the server classpath.
* Modify Compression.java to fall back to an aircompressor pure Java
implementation if schema specifies a compression algorithm, a Hadoop native
codec was specified as desired implementation, but the requisite native support
is somehow not available.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)