Joe McDonnell has submitted this change and it was merged. (
http://gerrit.cloudera.org:8080/22718 )
Change subject: IMPALA-13923: Support more compression levels for ZSTD and ZLIB
......................................................................
IMPALA-13923: Support more compression levels for ZSTD and ZLIB
This patch adds support for more compression levels for ZLIB, ZSTD
and BZIP2.
The following additional compression levels are now supported.
For ZSTD,
ZSTD_minCLevel(-ZSTD_TARGETLENGTH_MAX) to ZSTD_maxCLevel(20)
For ZLIB i.e. ZLIB, GZIP and DEFLATE,
Z_DEFAULT_COMPRESSION(1) to Z_BEST_COMPRESSION(9)
For BZIP2 i.e. ZLIB, GZIP and DEFLATE,
BlockSize100k * (1) to BlockSize100k * (9)
Note:
Currently, BZIP2 is only used by TmpFileMgr. It is not supported
by Parquet(i.e. for writing tables).
These are now supported with the "compression_codec" query option.
This has been implemented by refactoring compression levels as an
optional parameter in CodecInfo.
For ZSTD, negative compression levels are now supported IMPALA-10630.
Usage of compression level has been refactored with std::optional in
- exec/parquet/hdfs-parquet-table-writer
- runtime/tmp-file-mgr
- service/query-options
- util/codec
- util/compress
To validate compression levels externally, the following method has
been added
- Status Codec::ValidateCompressionLevel
Added new tests for -
* Additional compression levels for ZLIB, ZSTD and BZIP2
* Query option - "compression_codec" for the newly added formats
and compression levels
The following tests were executed to verify codecs and compression levels.
- DecompressorTest.ZSTD*
- DecompressorTest.Gzip
- DecompressorTest.Bzip
- QueryOptions.CompressionCodec
- TestComputeStats::test_compute_stats_compression_codec
For the stored Parquet, manually verified the compression codec used for
ZSTD and ZLIB.
Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
Reviewed-on: http://gerrit.cloudera.org:8080/22718
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Joe McDonnell <[email protected]>
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/codec.cc
M be/src/util/codec.h
M be/src/util/compress.cc
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/parse-util.cc
M be/src/util/parse-util.h
12 files changed, 240 insertions(+), 91 deletions(-)
Approvals:
Impala Public Jenkins: Verified
Michael Smith: Looks good to me, but someone else must approve
Joe McDonnell: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/22718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
Gerrit-Change-Number: 22718
Gerrit-PatchSet: 12
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>