Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10462 )
Change subject: IMPALA-6941: load more text scanner compression plugins ...................................................................... IMPALA-6941: load more text scanner compression plugins Add extensions for LZ4 and ZSTD (which are supported by Hadoop). Even without a plugin this results in better behaviour because we don't try to treat the files with unknown extensions as uncompressed text. Also allow loading tables containing files with unsupported compression types. There was weird behaviour before we knew of the file extension but didn't support querying the table - the catalog would load the table but the impalad would fail processing the catalog update. The simplest way to fix it is to just allow loading the tables. Similarly, make the "LOAD DATA" operation more permissive - we can copy files into a directory even if we can't decompress them. Switch to always checking plugin version - running mismatched plugin is inherently unsafe. Testing: Positive case where LZO is loaded is exercised. Added coverage for negative case where LZO is disabled. Fixed test gaps: * Querying LZO table with LZO plugin not available. * Interacting with tables with known but unsupported text compressions. * Querying files with unknown compression suffixes (which are treated as uncompressed text). Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Reviewed-on: http://gerrit.cloudera.org:8080/10165 Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/10462 --- M be/src/exec/CMakeLists.txt D be/src/exec/hdfs-lzo-text-scanner.cc D be/src/exec/hdfs-lzo-text-scanner.h A be/src/exec/hdfs-plugin-text-scanner.cc A be/src/exec/hdfs-plugin-text-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M common/fbs/CatalogObjects.fbs M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java A testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test A testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test A tests/custom_cluster/test_scanner_plugin.py M tests/metadata/test_partition_metadata.py 17 files changed, 459 insertions(+), 280 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: merged Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10462 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>