[
https://issues.apache.org/jira/browse/HBASE-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814521#comment-17814521
]
Andrew Kyle Purtell commented on HBASE-28343:
---------------------------------------------
We write the compression algorithm ordinal into the trailer, that used to be
sufficient, but then I added these new codecs where some of the implementation
options had limitations, where one flavor might not be compatible with another
-- especially Zstandard! -- although it was assumed that an operator never
changes codec configuration once data is live in the cluster because a codec
option is always compatible with itself of course.
bq. I think this problem could be solved by writing the classname of the codec
used into the hfile. This could be used as a hint so that a regionserver can
read hfiles compressed with any compression codec that it supports.
+1, makes sense to me.
Adds some safety and improves handling when codec implementations of a various
algorithm may have been mixed. Although that should not be recommended
practice.
There is also HBASE-27706. The idea there is to implement a Hadoop codec
compatible HBase side codec using zstd-jni, which I think is possible.
> Write codec class into hfile header/trailer
> -------------------------------------------
>
> Key: HBASE-28343
> URL: https://issues.apache.org/jira/browse/HBASE-28343
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Priority: Major
>
> We recently started playing around with the new bundled compression libraries
> as of 2.5.0. Specifically, we are experimenting with the different zstd
> codecs. The book says that aircompressor's zstd is not data compatible with
> hadoops, but doesn't say the same about zstd-jni.
> In our experiments we ended up in a state where some hfiles were encoded with
> zstd-jni (zstd.ZstdCodec) while others were encoded with hadoop
> (ZStandardCodec). At this point the cluster became extremely unstable, with
> some files unable to be read because they encoded with a codec that didn't
> match the current runtime configration. Changing the runtime configuration
> caused the other files to not be readable.
> I think this problem could be solved by writing the classname of the codec
> used into the hfile. This could be used as a hint so that a regionserver can
> read hfiles compressed with any compression codec that it supports.
> [~apurtell] do you have any thoughts here since you brought us all of these
> great compression options?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)