Bryan Beaudreault created HBASE-28343:
-----------------------------------------
Summary: Write codec class into hfile header/trailer
Key: HBASE-28343
URL: https://issues.apache.org/jira/browse/HBASE-28343
Project: HBase
Issue Type: Improvement
Reporter: Bryan Beaudreault
We recently started playing around with the new bundled compression libraries
as of 2.5.0. Specifically, we are experimenting with the different zstd codecs.
The book says that aircompressor's zstd is not data compatible with hadoops,
but doesn't say the same about zstd-jni.
In our experiments we ended up in a state where some hfiles were encoded with
zstd-jni (zstd.ZstdCodec) while others were encoded with hadoop
(ZStandardCodec). At this point the cluster became extremely unstable, with
some files unable to be read because they encoded with a codec that didn't
match the current runtime configration. Changing the runtime configuration
caused the other files to not be readable.
I think this problem could be solved by writing the classname of the codec used
into the hfile. This could be used as a hint so that a regionserver can read
hfiles compressed with any compression codec that it supports.
[~apurtell] do you have any thoughts here since you brought us all of these
great compression options?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)