Abhishek Rawat has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/13507 )
Change subject: IMPALA-8450: Add support for zstd in parquet ...................................................................... IMPALA-8450: Add support for zstd in parquet Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain directory. Other changes were made to make zstd headers and libs accessible. Class ZstandardCompressor/ZstandardDecompressor was added to provide interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd supports different compression levels (clevel) from 1 to ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive values represents uncompressed data they won't be supported. The default clevel is ZSTD_CLEVEL_DEFAULT. HdfsParquetTableWriter was updated to support ZSTD codec. The new codecs can be set using existing query option as follows: set COMPRESSION_CODEC=ZSTD:<clevel>; set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT Testing: - Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT clevel and a random clevel. The test unit decompresses an input compressed data and validates the result. It also tests for expected behavior when passing an over/under sized buffer for decompressing. - Added unit tests for valid/invalid values for COMPRESSION_CODEC. - Added e2e test in test_insert_parquet.py which tests writing/read- ing (null/non-null) data into/from a table (w different data type columns) using multiple codecs. Other existing e2e tests were updated to also use parquet/zstd table format. - Manual interoperability tests were run between Impala and Hive. Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/catalog/catalog-util.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/experiments/compression-test.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/codec.cc M be/src/util/codec.h M be/src/util/compress.cc M be/src/util/compress.h M be/src/util/decompress-test.cc M be/src/util/decompress.cc M be/src/util/decompress.h M be/src/util/runtime-profile.cc M bin/bootstrap_toolchain.py M bin/impala-config.sh A cmake_modules/FindZstd.cmake M common/thrift/CatalogObjects.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/generate_error_codes.py A testdata/workloads/functional-query/queries/QueryTest/insert_parquet_multi_codecs.test M tests/common/test_dimensions.py M tests/query_test/test_insert.py M tests/query_test/test_insert_parquet.py 28 files changed, 474 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/13507/4 -- To view, visit http://gerrit.cloudera.org:8080/13507 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7 Gerrit-Change-Number: 13507 Gerrit-PatchSet: 4 Gerrit-Owner: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>