[Impala-ASF-CR] IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading
Fucun Chu has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18028 ) Change subject: IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading .. IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading This patch fixes the data loading problem of integrating Apache Hive 3 and switches to the tez engine. Add HIVE-21569, HIVE-20038 patches and recompile the hive-exec module. Testing: - Manually perform data loading steps. Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1 --- M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/resources/hive-site.xml.py M testdata/bin/generate-schema-statements.py M testdata/bin/load_nested.py M testdata/bin/patch_hive.sh M testdata/cluster/hive/README A testdata/cluster/hive/patch1-HIVE-21569.diff A testdata/cluster/hive/patch2-HIVE-20038.diff M tests/util/test_file_parser.py 11 files changed, 381 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/18028/8 -- To view, visit http://gerrit.cloudera.org:8080/18028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1 Gerrit-Change-Number: 18028 Gerrit-PatchSet: 8 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-11422: Bump Apache Hive to 3.1.3
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18716 Change subject: IMPALA-11422: Bump Apache Hive to 3.1.3 .. IMPALA-11422: Bump Apache Hive to 3.1.3 This patch bump up version of Apache Hive to 3.1.3. Change-Id: I395a99cccf0a8902b3fd47235c32b69c5494f291 --- M bin/impala-config.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/18716/1 -- To view, visit http://gerrit.cloudera.org:8080/18716 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I395a99cccf0a8902b3fd47235c32b69c5494f291 Gerrit-Change-Number: 18716 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18028 ) Change subject: IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/18028/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18028/5//COMMIT_MSG@15 PS5, Line 15: - Manually perform data loading steps. > We can skip tests that depends on this. Done -- To view, visit http://gerrit.cloudera.org:8080/18028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1 Gerrit-Change-Number: 18028 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 02 May 2022 12:49:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading
Fucun Chu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18028 ) Change subject: IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading .. IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading This patch fixes the data loading problem of integrating Apache Hive 3 and switches to the tez engine. Add HIVE-21569, HIVE-20038 patches and recompile the hive-exec module. Testing: - Manually perform data loading steps. Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1 --- M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/resources/hive-site.xml.py M testdata/bin/generate-schema-statements.py M testdata/bin/load_nested.py M testdata/bin/patch_hive.sh M testdata/cluster/hive/README A testdata/cluster/hive/patch1-HIVE-21569.diff A testdata/cluster/hive/patch2-HIVE-20038.diff M tests/util/test_file_parser.py 11 files changed, 381 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/18028/7 -- To view, visit http://gerrit.cloudera.org:8080/18028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1 Gerrit-Change-Number: 18028 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in cdp-hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/jenkins/build-all-flag-combinations.sh M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java R fe/src/main/java/org/apache/impala/util/HiveMetadataFormatUtils.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 27 files changed, 2,514 insertions(+), 1,030 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/16 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 16 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in cdp-hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/jenkins/build-all-flag-combinations.sh M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 27 files changed, 3,198 insertions(+), 1,026 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/15 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 15 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in cdp-hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/jenkins/build-all-flag-combinations.sh M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 27 files changed, 3,198 insertions(+), 1,026 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/14 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 14 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 24 files changed, 3,011 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/13 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 13 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 25 files changed, 2,998 insertions(+), 261 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/12 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 12 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17503 ) Change subject: IMPALA-10771: Add Tencent COS support .. Patch Set 5: The hadoop-cos project has added a license and follows the MIT license. https://github.com/tencentyun/hadoop-cos/issues/35 -- To view, visit http://gerrit.cloudera.org:8080/17503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Gerrit-Change-Number: 17503 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 29 Nov 2021 11:00:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17503 ) Change subject: IMPALA-10771: Add Tencent COS support .. IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 --- M be/src/exec/hdfs-table-sink.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M java/executor-deps/pom.xml M java/pom.xml M testdata/bin/create-load-data.sh M testdata/bin/run-all.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M tests/authorization/test_ranger.py M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_admission_controller.py M tests/custom_cluster/test_coordinators.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_hive_parquet_codec_interop.py M tests/custom_cluster/test_hive_text_codec_interop.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_lineage.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_local_tz_conversion.py M tests/custom_cluster/test_metadata_replicas.py M tests/custom_cluster/test_metastore_service.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/custom_cluster/test_query_retries.py M tests/custom_cluster/test_restart_services.py M tests/custom_cluster/test_topic_update_frequency.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_catalogd_debug_actions.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_acid.py M tests/query_test/test_date_queries.py M tests/query_test/test_hbase_queries.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_resource_limits.py M tests/query_test/test_scanners.py M tests/stress/test_acid_stress.py M tests/stress/test_ddl_stress.py M tests/util/filesystem_utils.py 60 files changed, 275 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17503/5 -- To view, visit http://gerrit.cloudera.org:8080/17503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Gerrit-Change-Number: 17503 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 25 files changed, 2,993 insertions(+), 256 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/11 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 11 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 22 files changed, 2,765 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/10 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 10 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/17774/9/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java: http://gerrit.cloudera.org:8080/#/c/17774/9/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java@20 PS9, Line 20: > can you add a class level comment to describe what goes into this file? Loo Done http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/bin/patch_hive.sh File testdata/bin/patch_hive.sh: http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/bin/patch_hive.sh@20 PS9, Line 20: # This script is used to repair service startup and task running problems that occu > not sure I understand. Can you be more specific on the purpose of this file Done http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/cluster/hive/README File testdata/cluster/hive/README: http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/cluster/hive/README@5 PS9, Line 5: Contains only patches for `hive_metastore.thrift` is used to solve the problem that the : generated cpp file cannot be compiled. > do you mean that Hive 3.1's thrift file cannot generate compilable cpp code https://issues.apache.org/jira/browse/HIVE-21586 has solved this problem in the 3.2 and 4.0 branches. This patch is applied here as a transition solution. -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 10 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 25 Nov 2021 14:06:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18028 Change subject: WIP IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading .. WIP IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading This patch fixes the data loading problem of integrating Apache Hive 3 and switches to the tez engine. Add HIVE-21569, HIVE-20038 patches and recompile the hive-exec module. Todos: - The number of tpch_nested_parquet.customer files is inconsistent with that generated by cdp - Need more testing Testing: - Manually perform data loading steps. Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1 --- M buildall.sh M fe/src/test/resources/hive-site.xml.py M testdata/bin/generate-schema-statements.py M testdata/bin/load_nested.py M testdata/cluster/hive/README A testdata/cluster/hive/patch1-HIVE-21569.diff A testdata/cluster/hive/patch2-HIVE-20038.diff M tests/util/test_file_parser.py 8 files changed, 346 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/18028/5 -- To view, visit http://gerrit.cloudera.org:8080/18028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1 Gerrit-Change-Number: 18028 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M bin/rat_exclude_files.txt M buildall.sh M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml A testdata/bin/patch_hive.sh A testdata/cluster/hive/README A testdata/cluster/hive/patch0-HIVE-21586.diff 22 files changed, 2,755 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/9 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 9 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA 10954: Make create, drop methods of kudu catalog service public
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17981 ) Change subject: IMPALA 10954: Make create, drop methods of kudu catalog service public .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17981/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17981/3//COMMIT_MSG@7 PS3, Line 7: IMPALA 10954 IMPALA-10954 -- To view, visit http://gerrit.cloudera.org:8080/17981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib60142424d8e758031de596831d98aed69d488ef Gerrit-Change-Number: 17981 Gerrit-PatchSet: 3 Gerrit-Owner: Deepti Sehrawat Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 17 Nov 2021 14:50:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7942: Add query hints for cardinalities and selectivities
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18023 ) Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/18023/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18023/1//COMMIT_MSG@21 PS1, Line 21: hint value only valid when table does not have stats or stats is corrupt. nit: line should have 72 or fewer characters -- To view, visit http://gerrit.cloudera.org:8080/18023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b Gerrit-Change-Number: 18023 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 15 Nov 2021 14:34:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS-3 use Hive 4 APIs. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-20038, HIVE-22717. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml M testdata/bin/generate-schema-statements.py M testdata/bin/load_nested.py M tests/util/test_file_parser.py 20 files changed, 2,522 insertions(+), 119 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/8 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 8 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables
Fucun Chu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17842 ) Change subject: IMPALA-5741: Support reading tiny RDBMS tables .. IMPALA-5741: Support reading tiny RDBMS tables This patch uses the "external data source" mechanism in Impala and writes a data source for querying jdbc. It has some limitations: - It is not distributed. - Only support binary predicates with operators =, !=, <=, >=, <, > to be pushed to RDBMS In order to query the RDBMS tables, the following steps should be followed (note that existing data source table will be rebuilt): 1. Make sure that the database driver package has been added to the classpath and the minicluster cluster has been started. 2. Copy the data source library into HDFS. ${IMPALA_HOME}/testdata/bin/copy-data-sources.sh 3. Create an `alltypes` table in the postgres database. ${IMPALA_HOME}/testdata/bin/load-data-sources.sh 4. Create data sources table(alltypes_jdbc_datasource). ${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\ ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql Testing: - Ran core tests successfully. Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 --- M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java A java/ext-data-source/jdbc/pom.xml A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java A java/ext-data-source/jdbc/src/test/resources/log4j.properties A java/ext-data-source/jdbc/src/test/resources/test_script.sql M java/ext-data-source/pom.xml M testdata/bin/copy-data-sources.sh M testdata/bin/create-data-source-table.sql M testdata/bin/create-load-data.sh A testdata/bin/load-data-sources.sh M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test 28 files changed, 2,003 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/7 -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and Apache-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against Apache-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS-3 use Hive 4 APIs. This will be on-going effort and test failures on Apache-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-20038, HIVE-22717. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml M testdata/bin/generate-schema-statements.py M tests/util/test_file_parser.py 19 files changed, 2,203 insertions(+), 118 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/7 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17774 ) Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and Apache-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against Apache-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS-3 use Hive 4 APIs. This will be on-going effort and test failures on Apache-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-20038, HIVE-22717. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml M testdata/bin/generate-schema-statements.py M tests/util/test_file_parser.py 19 files changed, 1,500 insertions(+), 118 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/6 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17774 Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 .. IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and Apache-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against Apache-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS-3 use Hive 4 APIs. This will be on-going effort and test failures on Apache-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-20038, HIVE-22717. This hack will be added to the build in subsequent tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 --- M fe/pom.xml A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M java/pom.xml M testdata/bin/generate-schema-statements.py M tests/util/test_file_parser.py 19 files changed, 1,487 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/4 -- To view, visit http://gerrit.cloudera.org:8080/17774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Gerrit-Change-Number: 17774 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17842 ) Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables .. WIP IMPALA-5741: Support reading tiny RDBMS tables This patch uses the "external data source" mechanism in Impala and writes a data source for querying jdbc. It has some limitations: - It is not distributed. - Only support binary predicates with operators =, !=, <=, >=, <, > to be pushed to RDBMS In order to query the RDBMS tables, the following steps should be followed (note that existing data source table will be rebuilt): 1. Make sure that the database driver package has been added to the classpath and the minicluster cluster has been started. 2. Copy the data source library into HDFS. ${IMPALA_HOME}/testdata/bin/copy-data-sources.sh 3. Create an `alltypes` table in the postgres database. ${IMPALA_HOME}/testdata/bin/load-data-sources.sh 4. Create data sources table(alltypes_jdbc_datasource). ${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\ ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql Testing: - Ran core tests successfully. Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 --- M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java A java/ext-data-source/jdbc/pom.xml A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java A java/ext-data-source/jdbc/src/test/resources/log4j.properties A java/ext-data-source/jdbc/src/test/resources/test_script.sql M java/ext-data-source/pom.xml M testdata/bin/copy-data-sources.sh M testdata/bin/create-data-source-table.sql M testdata/bin/create-load-data.sh A testdata/bin/load-data-sources.sh M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test 28 files changed, 1,977 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/6 -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17842 Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables .. WIP IMPALA-5741: Support reading tiny RDBMS tables This patch uses the "external data source" mechanism in Impala and writes a data source for querying jdbc. It has some limitations: - It is not distributed. - Only support binary predicates with operators =, !=, <=, >=, <, > to be pushed to RDBMS In order to query the RDBMS tables, the following steps should be followed (note that existing data source table will be rebuilt): 1. Make sure that the database driver package has been added to the classpath and the minicluster cluster has been started. 2. Copy the data source library into HDFS. ${IMPALA_HOME}/testdata/bin/copy-data-sources.sh 3. Create an `alltypes` table in the postgres database. ${IMPALA_HOME}/testdata/bin/load-data-sources.sh 4. Create data sources table(alltypes_jdbc_datasource). ${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\ ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql Testing: - Ran core tests successfully. Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 --- M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java A java/ext-data-source/jdbc/pom.xml A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java A java/ext-data-source/jdbc/src/test/resources/log4j.properties A java/ext-data-source/jdbc/src/test/resources/test_script.sql M java/ext-data-source/pom.xml M testdata/bin/copy-data-sources.sh M testdata/bin/create-data-source-table.sql M testdata/bin/create-load-data.sh A testdata/bin/load-data-sources.sh M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test 28 files changed, 1,975 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/5 -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] Impala-10994: Normalize pip package name
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17987 ) Change subject: Impala-10994: Normalize pip package name .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17987/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17987/4//COMMIT_MSG@7 PS4, Line 7: Impala-10994 The ticket address must be uppercase, IMPALA-10994. http://gerrit.cloudera.org:8080/#/c/17987/4//COMMIT_MSG@8 PS4, Line 8: Please add a message that is exactly long enough to explain what the problem was, and how it was fixed. Each should have 72 or fewer characters if possible. see: https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala -- To view, visit http://gerrit.cloudera.org:8080/17987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I479df0ad7acf3c650b8f5317372261d5e2840864 Gerrit-Change-Number: 17987 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 01 Nov 2021 11:13:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10997: Refactor Java Hive UDF code.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17986 ) Change subject: IMPALA-10997: Refactor Java Hive UDF code. .. Patch Set 1: (2 comments) This looks good, I only had some minor comments. http://gerrit.cloudera.org:8080/#/c/17986/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17986/1//COMMIT_MSG@19 PS1, Line 19: HiveUdfExecutor: Abstract base class that contains code that is common to : the legacy UDF.class and the GenericUDF.class when it is eventually created. : HiveUdfExecutorLegacy: Implementation of the code that is UDF.class specific. nit: each line should have 72 or fewer characters if possible. http://gerrit.cloudera.org:8080/#/c/17986/1/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java File fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java: http://gerrit.cloudera.org:8080/#/c/17986/1/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@105 PS1, Line 105: classLoaderClosed_ = true; Why not use classLoader_ = null? -- To view, visit http://gerrit.cloudera.org:8080/17986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1b981aed3021aef08c87e7cdbf7c6af95906754 Gerrit-Change-Number: 17986 Gerrit-PatchSet: 1 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 01 Nov 2021 08:21:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] datasketches: improved merge and memory allocation - avoid overhead of constructing union and getting result from it every time - call destructors of sketch and union objects
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: datasketches: improved merge and memory allocation - avoid overhead of constructing union and getting result from it every time - call destructors of sketch and union objects .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17869/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17869/1//COMMIT_MSG@7 PS1, Line 7: datasketches: improved merge and memory allocation First, you need to create a jira ticket for this patch at https://issues.apache.org/jira/browse/IMPALA Second, please write a good, clear commit message, with a short, descriptive title and a message that is exactly long enough to explain what the problem was, and how it was fixed. Each should have 72 or fewer characters if possible. The first line should have an empty line after it, and the first line should begin with the ticket(s) addressed, followed by a colon and a space, eg: "IMPALA-1234: ". -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 1 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 07 Oct 2021 10:19:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@9 PS1, Line 9: To enable fine-grained table refreshing, there are three main changes in this commit. nit: each line should have 72 or fewer characters if possible. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Mon, 27 Sep 2021 09:13:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 ) Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1722 PS1, Line 1722: nctionCon > precision is not the best name. I would suggest following the datasketches Done http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1826 PS1, Line 1826: _cast why max here, not the specified precision? The resulting accuracy of a sketch returned at the end of the unioning process will be a function of the smallest of lg_max_k and lg_config_k that the union operator has seen. see: https://github.com/apache/datasketches-cpp/blob/master/hll/include/hll.hpp#L404-L407 In order not to affect the union operation of the high-precision ds_hll_sketch result sketch, lg_max_k takes the maximum value. If necessary, precision parameters will be added to ds_hll_union in the new jira -- To view, visit http://gerrit.cloudera.org:8080/17744 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014 Gerrit-Change-Number: 17744 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 26 Sep 2021 10:42:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Fucun Chu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17744 ) Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision .. IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision This path addresses the current limitation in DS_HLL_SKETCH function by extending the function to optionally take a secondary argument called precision. DS_HLL_SKETCH(expression [, precision]) The precision value must be between 4 and 21, specified as an integer literal. The default is 12. Here are test results of a typical workload in tpch25.lineitem (#1): ++ | Metric| Count Distinct | DS_HLL-12 | DS_HLL-16 | DS_HLL-21 | ++ | Memory(MB) | 725.43 | 124.87 |123.19 |121.85 | | Duration(s) | 5.64 | 1.03 |1.13| 1.64 | | ErrorRate | 0% | 1.26%|0.22% | 0.05% | ++ Testing: 1. Ran unit tests against table lineitem in TPC-DS in both serial and parallel plan settings; 2. Ran "core" tests. Change-Id: I91a360bb046d4abb101641772b6159308bf6c014 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test 6 files changed, 155 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/2 -- To view, visit http://gerrit.cloudera.org:8080/17744 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014 Gerrit-Change-Number: 17744 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Hello Quanlong Huang, Laszlo Gaal, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17793 to look at the new patch set (#8). Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster This patch modifies the minicluster script to optionally use Apache Hive 3.1.2 instead of CDP Hive 3.1.3. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_APACHE_HIVE is set to true the bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is CDP Hive 3.1.3 Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses a different database name so that it is easy to switch from working from one environment which uses CDP Hive 3.1.3 metastore to another which usese Apache Hive 3.1.2 metastore. In order to start a minicluster which uses Apache Hive 3.1.2 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_APACHE_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Apache Hive 3.1.2 tarballs and extracts them in toolchain/apache_components directory. > rm $HIVE_HOME/lib/guava-*jar > cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/ The above command is to fix HIVE-22915 > bin/create-test-configuration.sh -create_metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Apache Hive 3.1.2 schema is initialized. > testdata/bin/run-all.sh Follow-up: - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871 Tests: - Made sure that the cluster comes up with Apache Hive 3.1.2 when the steps above are performed. - Made sure that existing scripts work as they do currently when argument is not provided. Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh 3 files changed, 89 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/8 -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 8 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17793 ) Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/17793/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17793/7//COMMIT_MSG@43 PS7, Line 43: so that a new metastore db is created and the Apache Hive 3.1.2 schema > nit: wrap at 72 Done -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 8 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Sep 2021 02:29:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17818 ) Change subject: IMPALA-10901 cleaner and faster operations with datasketches .. Patch Set 2: The following EE Tests files need to be modified: testdata/workloads/functional-query/queries/QueryTest/ datasketches-cpc.test datasketches-hll.test datasketches-kll.test datasketches-theta.test "UDF ERROR: Unable to deserialize sketch" needs to add e.what() information. Run the above test case file using the following command: cd tests impala-py.test query_test/test_datasketches.py Or use pre-review-test (https://jenkins.impala.io/job/pre-review-test/build?delay=0sec) to run the test -- To view, visit http://gerrit.cloudera.org:8080/17818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb Gerrit-Change-Number: 17818 Gerrit-PatchSet: 2 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 01 Sep 2021 03:05:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Fucun Chu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17793 ) Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster This patch modifies the minicluster script to optionally use Apache Hive 3.1.2 instead of CDP Hive 3.1.3. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_APACHE_HIVE is set to true the bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is CDP Hive 3.1.3 Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses a different database name so that it is easy to switch from working from one environment which uses CDP Hive 3.1.3 metastore to another which usese Apache Hive 3.1.2 metastore. In order to start a minicluster which uses Apache Hive 3.1.2 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_APACHE_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Apache Hive 3.1.2 tarballs and extracts them in toolchain/apache_components directory. > rm $HIVE_HOME/lib/guava-*jar > cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/ The above command is to fix HIVE-22915 > bin/create-test-configuration.sh -create_metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Apache Hive 3.1.2 schema is initialized. > testdata/bin/run-all.sh Follow-up: - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871 Tests: - Made sure that the cluster comes up with Apache Hive 3.1.2 when the steps above are performed. - Made sure that existing scripts work as they do currently when argument is not provided. Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh 3 files changed, 89 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/7 -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17793 ) Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster This patch modifies the minicluster script to optionally use Apache Hive 3.1.2 instead of CDP Hive 3.1.3. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_APACHE_HIVE is set to true the bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is CDP Hive 3.1.3 Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses a different database name so that it is easy to switch from working from one environment which uses CDP Hive 3.1.3 metastore to another which usese Apache Hive 3.1.2 metastore. In order to start a minicluster which uses Apache Hive 3.1.2 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_APACHE_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Apache Hive 3.1.2 tarballs and extracts them in toolchain/apache_components directory. > rm $HIVE_HOME/lib/guava-*jar > cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/ The above command is to fix HIVE-22915 > bin/create-test-configuration.sh -create_metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Apache Hive 3.1.2 schema is initialized. > testdata/bin/run-all.sh Follow-up: - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871 Tests: - Made sure that the cluster comes up with Apache Hive 3.1.2 when the steps above are performed. - Made sure that existing scripts work as they do currently when argument is not provided. Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh 3 files changed, 89 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/6 -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17793 ) Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster This patch modifies the minicluster script to optionally use Apache Hive 3.1.2 instead of CDP Hive 3.1.3. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_APACHE_HIVE is set to true the bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is CDP Hive 3.1.3 Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses a different database name so that it is easy to switch from working from one environment which uses CDP Hive 3.1.3 metastore to another which usese Apache Hive 3.1.2 metastore. In order to start a minicluster which uses Apache Hive 3.1.2 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_APACHE_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Apache Hive 3.1.2 tarballs and extracts them in toolchain/apache_components directory. > rm $HIVE_HOME/lib/guava-*jar > cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/ The above command is to fix HIVE-22915 > bin/create-test-configuration.sh -create_metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Apache Hive 3.1.2 schema is initialized. > testdata/bin/run-all.sh Follow-up: - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871 Tests: - Made sure that the cluster comes up with Apache Hive 3.1.2 when the steps above are performed. - Made sure that existing scripts work as they do currently when argument is not provided. Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh 3 files changed, 86 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/5 -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17793 ) Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster This patch modifies the minicluster script to optionally use Apache Hive 3.1.2 instead of CDP Hive 3.1.3. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_APACHE_HIVE is set to true the bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is CDP Hive 3.1.3 Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses a different database name so that it is easy to switch from working from one environment which uses CDP Hive 3.1.3 metastore to another which usese Apache Hive 3.1.2 metastore. In order to start a minicluster which uses Apache Hive 3.1.2 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_APACHE_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Apache Hive 3.1.2 tarballs and extracts them in toolchain/apache_components directory. > rm $HIVE_HOME/lib/guava-*jar > cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/ The above command is to fix HIVE-22915 > bin/create-test-configuration.sh -create_metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Apache Hive 3.1.2 schema is initialized. > testdata/bin/run-all.sh Follow-up: - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871 Tests: - Made sure that the cluster comes up with Apache Hive 3.1.2 when the steps above are performed. - Made sure that existing scripts work as they do currently when argument is not provided. Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh 3 files changed, 84 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/4 -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17793 ) Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/17793/3/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/17793/3/bin/bootstrap_toolchain.py@324 PS3, Line 324: class ApacheComponent(EnvVersionedPackage): > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/17793/3/bin/bootstrap_toolchain.py@326 PS3, Line 326: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/17793/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/17793/3/bin/impala-config.sh@261 PS3, Line 261: # When USE_APACHE_HIVE is set we use the apache hive version to build as well as deploy in > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 21 Aug 2021 04:22:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17793 Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster .. IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster This patch modifies the minicluster script to optionally use Apache Hive 3.1.2 instead of CDP Hive 3.1.3. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_APACHE_HIVE is set to true the bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is CDP Hive 3.1.3 Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses a different database name so that it is easy to switch from working from one environment which uses CDP Hive 3.1.3 metastore to another which usese Apache Hive 3.1.2 metastore. In order to start a minicluster which uses Apache Hive 3.1.2 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_APACHE_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Apache Hive 3.1.2 tarballs and extracts them in toolchain/apache_components directory. > rm $HIVE_HOME/lib/guava-*jar > cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/ The above command is to fix HIVE-22915 > bin/create-test-configuration.sh -create_metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Apache Hive 3.1.2 schema is initialized. > testdata/bin/run-all.sh Follow-up: - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871 Tests: - Made sure that the cluster comes up with Apache Hive 3.1.2 when the steps above are performed. - Made sure that existing scripts work as they do currently when argument is not provided. Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh 3 files changed, 81 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/3 -- To view, visit http://gerrit.cloudera.org:8080/17793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6 Gerrit-Change-Number: 17793 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17744 Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision .. IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision This path addresses the current limitation in DS_HLL_SKETCH function by extending the function to optionally take a secondary argument called precision. DS_HLL_SKETCH(expression [, precision]) The precision value must be between 4 and 21, specified as an integer literal. The default is 12. Here are test results of a typical workload in tpch25.lineitem (#1): ++ | Metric| Count Distinct | DS_HLL-12 | DS_HLL-16 | DS_HLL-21 | ++ | Memory(MB) | 725.43 | 124.87 |123.19 |121.85 | | Duration(s) | 5.64 | 1.03 |1.13| 1.64 | | ErrorRate | 0% | 1.26%|0.22% | 0.05% | ++ Testing: 1. Ran unit tests against table lineitem in TPC-DS in both serial and parallel plan settings; 2. Ran "core" tests. Change-Id: I91a360bb046d4abb101641772b6159308bf6c014 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test 6 files changed, 155 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/1 -- To view, visit http://gerrit.cloudera.org:8080/17744 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014 Gerrit-Change-Number: 17744 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab
[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17726 ) Change subject: IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI .. Patch Set 5: (2 comments) Thanks for the review! Addressed the comments. http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java File fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java: http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java@103 PS4, Line 103: execQuery > nit: rename it to 'execQueryAsync' since we won't wait for results here. Done http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java File fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java: http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java@338 PS4, Line 338: // Wait for logs to flush > Shouldn't we do this after sleep? Done -- To view, visit http://gerrit.cloudera.org:8080/17726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 Gerrit-Change-Number: 17726 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 28 Jul 2021 05:25:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17726 ) Change subject: IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI .. IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI This patch appends the username of the client who made the request to close a session or cancel a query from the coordinator's debug WebUI. Tests: - Added a new fe test for LDAP auth to verify that the new status gets printed in runtime profile and coordinator log when a query is cancelled in this way. Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 --- M be/src/kudu/util/web_callback_registry.h M be/src/service/impala-http-handler.cc M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java M tests/webserver/test_web_pages.py 6 files changed, 96 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/5 -- To view, visit http://gerrit.cloudera.org:8080/17726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 Gerrit-Change-Number: 17726 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17726 ) Change subject: IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI .. IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI This patch appends the username of the client who made the request to close a session or cancel a query from the coordinator's debug WebUI. Tests: - Added a new fe test for LDAP auth to verify that the new status gets printed in runtime profile and coordinator log when a query is cancelled in this way. Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 --- M be/src/kudu/util/web_callback_registry.h M be/src/service/impala-http-handler.cc M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java M tests/webserver/test_web_pages.py 6 files changed, 96 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/4 -- To view, visit http://gerrit.cloudera.org:8080/17726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 Gerrit-Change-Number: 17726 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17726 ) Change subject: IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI .. Patch Set 3: Added a new test for LDAP auth in LdapWebserverTest.java -- To view, visit http://gerrit.cloudera.org:8080/17726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 Gerrit-Change-Number: 17726 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 28 Jul 2021 01:19:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17726 ) Change subject: IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI .. IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI This patch appends the username of the client who made the request to close a session or cancel a query from the coordinator's debug WebUI. Tests: - Added a new fe test for LDAP auth to verify that the new status gets printed in runtime profile and coordinator log when a query is cancelled in this way. Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 --- M be/src/kudu/util/web_callback_registry.h M be/src/service/impala-http-handler.cc M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java M tests/webserver/test_web_pages.py 6 files changed, 96 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/3 -- To view, visit http://gerrit.cloudera.org:8080/17726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 Gerrit-Change-Number: 17726 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17726 Change subject: IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI .. IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI This patch appends the username of the client who made the request to close a session or cancel a query from the coordinator's debug WebUI. Tests: - Run related tests manually for use authentication Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 --- M be/src/kudu/util/web_callback_registry.h M be/src/service/impala-http-handler.cc M be/src/util/webserver.cc M tests/webserver/test_web_pages.py 4 files changed, 14 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/2 -- To view, visit http://gerrit.cloudera.org:8080/17726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55 Gerrit-Change-Number: 17726 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17503 ) Change subject: IMPALA-10771: Add Tencent COS support .. Patch Set 4: The issue has been created for tracking, see: https://github.com/tencentyun/hadoop-cos/issues/35. -- To view, visit http://gerrit.cloudera.org:8080/17503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Gerrit-Change-Number: 17503 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Jul 2021 14:19:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17503 ) Change subject: IMPALA-10771: Add Tencent COS support .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/impala_test_suite.py@1022 PS3, Line 1022: > flake8: E501 line too long (96 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/skip.py File tests/common/skip.py: http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/skip.py@131 PS3, Line 131: > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/17503/3/tests/metadata/test_stale_metadata.py File tests/metadata/test_stale_metadata.py: http://gerrit.cloudera.org:8080/#/c/17503/3/tests/metadata/test_stale_metadata.py@22 PS3, Line 22: from tests.common.skip import SkipIfS3, SkipIfGCS > flake8: F401 'tests.common.skip.SkipIfCOS' imported but unused Done -- To view, visit http://gerrit.cloudera.org:8080/17503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Gerrit-Change-Number: 17503 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 27 Jun 2021 06:01:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17503 ) Change subject: IMPALA-10771: Add Tencent COS support .. IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 --- M be/src/exec/hdfs-table-sink.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M java/executor-deps/pom.xml M java/pom.xml M testdata/bin/create-load-data.sh M testdata/bin/run-all.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M tests/authorization/test_ranger.py M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_admission_controller.py M tests/custom_cluster/test_coordinators.py M tests/custom_cluster/test_event_processing.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_hive_parquet_codec_interop.py M tests/custom_cluster/test_hive_text_codec_interop.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_lineage.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_local_tz_conversion.py M tests/custom_cluster/test_metadata_replicas.py M tests/custom_cluster/test_metastore_service.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/custom_cluster/test_query_retries.py M tests/custom_cluster/test_restart_services.py M tests/custom_cluster/test_topic_update_frequency.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_catalogd_debug_actions.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_reset_metadata.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_acid.py M tests/query_test/test_date_queries.py M tests/query_test/test_hbase_queries.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_resource_limits.py M tests/query_test/test_scanners.py M tests/stress/test_acid_stress.py M tests/stress/test_ddl_stress.py M tests/util/filesystem_utils.py 62 files changed, 279 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17503/4 -- To view, visit http://gerrit.cloudera.org:8080/17503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Gerrit-Change-Number: 17503 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17503 Change subject: IMPALA-10771: Add Tencent COS support .. IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 --- M be/src/exec/hdfs-table-sink.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M java/executor-deps/pom.xml M java/pom.xml M testdata/bin/create-load-data.sh M testdata/bin/run-all.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M tests/authorization/test_ranger.py M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_admission_controller.py M tests/custom_cluster/test_coordinators.py M tests/custom_cluster/test_event_processing.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_hive_parquet_codec_interop.py M tests/custom_cluster/test_hive_text_codec_interop.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_lineage.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_local_tz_conversion.py M tests/custom_cluster/test_metadata_replicas.py M tests/custom_cluster/test_metastore_service.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/custom_cluster/test_query_retries.py M tests/custom_cluster/test_restart_services.py M tests/custom_cluster/test_topic_update_frequency.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_catalogd_debug_actions.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_reset_metadata.py M tests/metadata/test_stale_metadata.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_acid.py M tests/query_test/test_date_queries.py M tests/query_test/test_hbase_queries.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_resource_limits.py M tests/query_test/test_scanners.py M tests/stress/test_acid_stress.py M tests/stress/test_ddl_stress.py M tests/util/filesystem_utils.py 63 files changed, 277 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17503/3 -- To view, visit http://gerrit.cloudera.org:8080/17503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Gerrit-Change-Number: 17503 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10717: Import Tuple functionality from DataSketches
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17515 Change subject: IMPALA-10717: Import Tuple functionality from DataSketches .. IMPALA-10717: Import Tuple functionality from DataSketches This patch imports the functionality needed for Tuple approximate algorithm from Apache DataSketches. I decided to copy the necessary files into be/src/thirdparty/datasketches. Browse the source files here: https://github.com/apache/datasketches-cpp/tree/3.0.0 Change-Id: If14fc224ee5e767054020c0efcc25e57289f8ac3 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/datasketches-test.cc M be/src/thirdparty/datasketches/README.md A be/src/thirdparty/datasketches/array_of_doubles_a_not_b.hpp A be/src/thirdparty/datasketches/array_of_doubles_a_not_b_impl.hpp A be/src/thirdparty/datasketches/array_of_doubles_intersection.hpp A be/src/thirdparty/datasketches/array_of_doubles_intersection_impl.hpp A be/src/thirdparty/datasketches/array_of_doubles_sketch.hpp A be/src/thirdparty/datasketches/array_of_doubles_sketch_impl.hpp A be/src/thirdparty/datasketches/array_of_doubles_union.hpp A be/src/thirdparty/datasketches/array_of_doubles_union_impl.hpp A be/src/thirdparty/datasketches/tuple_a_not_b.hpp A be/src/thirdparty/datasketches/tuple_a_not_b_impl.hpp A be/src/thirdparty/datasketches/tuple_intersection.hpp A be/src/thirdparty/datasketches/tuple_intersection_impl.hpp A be/src/thirdparty/datasketches/tuple_jaccard_similarity.hpp A be/src/thirdparty/datasketches/tuple_sketch.hpp A be/src/thirdparty/datasketches/tuple_sketch_impl.hpp A be/src/thirdparty/datasketches/tuple_union.hpp A be/src/thirdparty/datasketches/tuple_union_impl.hpp 20 files changed, 2,298 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17515/2 -- To view, visit http://gerrit.cloudera.org:8080/17515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If14fc224ee5e767054020c0efcc25e57289f8ac3 Gerrit-Change-Number: 17515 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10689: Implement ds cpc union f() function.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17440 ) Change subject: IMPALA-10689: Implement ds_cpc_union_f() function. .. Patch Set 4: Thanks for the reviews. I re-run the pre-review-test job and all test cases have passed, see: https://jenkins.impala.io/job/pre-review-test/969/. Try to re-run the gerrit-verify-dryrun job. -- To view, visit http://gerrit.cloudera.org:8080/17440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib5c616316bf2bf2ff437678e9a44a15339920150 Gerrit-Change-Number: 17440 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 04 Jun 2021 11:14:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10689: Implement ds cpc union f() function.
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17440 Change subject: IMPALA-10689: Implement ds_cpc_union_f() function. .. IMPALA-10689: Implement ds_cpc_union_f() function. This function receives two strings that are serialized Apache DataSketches CPC sketches. Union two sketches and returns the resulting sketch of union. Example: select ds_cpc_estimate(ds_cpc_union_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_cpc_estimate(ds_cpc_union_f(sketch1, sketch2)) | +---+ | 15| +---+ Change-Id: Ib5c616316bf2bf2ff437678e9a44a15339920150 --- M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test 6 files changed, 140 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/17440/3 -- To view, visit http://gerrit.cloudera.org:8080/17440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib5c616316bf2bf2ff437678e9a44a15339920150 Gerrit-Change-Number: 17440 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17373 ) Change subject: IMPALA-10688: Implement ds_cpc_stringify() function .. Patch Set 4: All tests run with the pre-review-test job passed, failed test cases are not reproduced. See:https://jenkins.impala.io/job/pre-review-test/948/. Can the gerrit-verify-dryrun job be re-run, thanks. -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 00:57:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17373 ) Change subject: IMPALA-10688: Implement ds_cpc_stringify() function .. IMPALA-10688: Implement ds_cpc_stringify() function This function receives a string that is a serialized Apache DataSketches CPC sketch and returns its stringified format. A stringified format should look like and contains the following data: select ds_cpc_stringify(ds_cpc_sketch(float_col)) from functional_parquet.alltypestiny; ++ | ds_cpc_stringify(ds_cpc_sketch(float_col)) | ++ | ### CPC sketch summary:| |lg_k : 11 | |seed hash : 93cc | |C : 2 | |flavor : 1 | |merged : true | |intresting col : 0 | |table entries : 2 | |window : not allocated | | ### End sketch summary | || ++ Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test 4 files changed, 59 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/17373/4 -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17373 Change subject: IMPALA-10688: Implement ds_cpc_stringify function .. IMPALA-10688: Implement ds_cpc_stringify function This function receives a string that is a serialized Apache DataSketches CPC sketch and returns its stringified format. A stringified format should look like and contains the following data: select ds_cpc_stringify(ds_cpc_sketch(float_col)) from functional_parquet.alltypestiny; ++ | ds_cpc_stringify(ds_cpc_sketch(float_col)) | ++ | ### CPC sketch summary:| |lg_k : 11 | |seed hash : 93cc | |C : 2 | |flavor : 1 | |merged : true | |intresting col : 0 | |table entries : 2 | |window : not allocated | | ### End sketch summary | || ++ Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test 4 files changed, 59 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/17373/2 -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17372 ) Change subject: IMPALA-10687: Implement ds_cpc_union() function .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17372/1/testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test: http://gerrit.cloudera.org:8080/#/c/17372/1/testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test@192 PS1, Line 192: # result as if the whole data was sketched together into a single sketch. > I checked the test above that are run on functional_parquet.alltypessmall, Done -- To view, visit http://gerrit.cloudera.org:8080/17372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6 Gerrit-Change-Number: 17372 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 12 May 2021 14:25:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17372 to look at the new patch set (#4). Change subject: IMPALA-10687: Implement ds_cpc_union() function .. IMPALA-10687: Implement ds_cpc_union() function This function receives a set of serialized Apache DataSketches CPC sketches produced by ds_cpc_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_cpc_estimate(ds_cpc_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_cpc_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_cpc_union() on those sketches Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/cpc_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test M tests/query_test/test_datasketches.py 7 files changed, 177 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17372/4 -- To view, visit http://gerrit.cloudera.org:8080/17372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6 Gerrit-Change-Number: 17372 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17372 ) Change subject: IMPALA-10687: Implement ds_cpc_union() function .. IMPALA-10687: Implement ds_cpc_union() function This function receives a set of serialized Apache DataSketches CPC sketches produced by ds_cpc_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_cpc_estimate(ds_cpc_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_cpc_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_cpc_union() on those sketches Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/cpc_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test M tests/query_test/test_datasketches.py 7 files changed, 169 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17372/3 -- To view, visit http://gerrit.cloudera.org:8080/17372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6 Gerrit-Change-Number: 17372 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17372 Change subject: IMPALA-10687: Implement ds_cpc_union() function .. IMPALA-10687: Implement ds_cpc_union() function This function receives a set of serialized Apache DataSketches CPC sketches produced by ds_cpc_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_cpc_estimate(ds_cpc_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_cpc_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_cpc_union() on those sketches Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/cpc_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test M tests/query_test/test_datasketches.py 7 files changed, 170 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17372/1 -- To view, visit http://gerrit.cloudera.org:8080/17372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6 Gerrit-Change-Number: 17372 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. Patch Set 4: All tests run with the pre-review-test job passed, see: https://jenkins.impala.io/job/pre-review-test/909/. The failed test in the gerrit-verify-dryrun job (query_test/test_fetch.py, from: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13658/) did not reappear. How to deal with this situation, re-run the gerrit-verify-dryrun job? -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 15 Apr 2021 01:57:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Fucun Chu has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/16656 ) Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions These functions can be used to get cardinality estimates of data using CPC algorithm from Apache DataSketches. ds_cpc_sketch() receives a dataset, e.g. a column from a table, and returns a serialized CPC sketch in string format. This can be written to a table or be fed directly to ds_cpc_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the CPC sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' CPC see: http://datasketches.apache.org/docs/CPC/CPC.html Figures-of-Merit Comparison of the HLL and CPC Sketches see: https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch_parquet.lineitem to compare perfomance with ndv(). Depending on data characteristics ndv() appears 2x-3x faster. CPC gives closer estimate than current ndv(). CPC is more accurate than HLL in some cases Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/cpc_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test M tests/query_test/test_datasketches.py 12 files changed, 398 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16656/8 -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 8 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. IMPALA-10631: Upgrade DataSketches to version 3.0.0 Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version 3.0.0 tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 --- M be/src/exprs/datasketches-test.cc M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp M be/src/thirdparty/datasketches/AuxHashMap.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp M be/src/thirdparty/datasketches/CouponHashSet.hpp M be/src/thirdparty/datasketches/CouponList-internal.hpp M be/src/thirdparty/datasketches/CouponList.hpp M be/src/thirdparty/datasketches/CubicInterpolation.hpp M be/src/thirdparty/datasketches/HarmonicNumbers.hpp M be/src/thirdparty/datasketches/Hll4Array-internal.hpp M be/src/thirdparty/datasketches/Hll4Array.hpp M be/src/thirdparty/datasketches/Hll6Array-internal.hpp M be/src/thirdparty/datasketches/Hll6Array.hpp M be/src/thirdparty/datasketches/Hll8Array-internal.hpp M be/src/thirdparty/datasketches/Hll8Array.hpp M be/src/thirdparty/datasketches/HllArray-internal.hpp M be/src/thirdparty/datasketches/HllArray.hpp M be/src/thirdparty/datasketches/HllSketch-internal.hpp M be/src/thirdparty/datasketches/HllSketchImpl.hpp M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp M be/src/thirdparty/datasketches/HllUnion-internal.hpp M be/src/thirdparty/datasketches/HllUtil.hpp M be/src/thirdparty/datasketches/MurmurHash3.h M be/src/thirdparty/datasketches/README.md M be/src/thirdparty/datasketches/RelativeErrorTables.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp M be/src/thirdparty/datasketches/cpc_common.hpp M be/src/thirdparty/datasketches/cpc_compressor.hpp M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp M be/src/thirdparty/datasketches/cpc_sketch.hpp M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp M be/src/thirdparty/datasketches/cpc_union.hpp M be/src/thirdparty/datasketches/cpc_union_impl.hpp M be/src/thirdparty/datasketches/cpc_util.hpp M be/src/thirdparty/datasketches/hll.hpp M be/src/thirdparty/datasketches/icon_estimator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp M be/src/thirdparty/datasketches/kll_sketch.hpp M be/src/thirdparty/datasketches/kll_sketch_impl.hpp M be/src/thirdparty/datasketches/memory_operations.hpp M be/src/thirdparty/datasketches/theta_a_not_b.hpp M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp A be/src/thirdparty/datasketches/theta_comparators.hpp A be/src/thirdparty/datasketches/theta_constants.hpp A be/src/thirdparty/datasketches/theta_helpers.hpp M be/src/thirdparty/datasketches/theta_intersection.hpp A be/src/thirdparty/datasketches/theta_intersection_base.hpp A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp M be/src/thirdparty/datasketches/theta_intersection_impl.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp M be/src/thirdparty/datasketches/theta_sketch.hpp M be/src/thirdparty/datasketches/theta_sketch_impl.hpp M be/src/thirdparty/datasketches/theta_union.hpp A be/src/thirdparty/datasketches/theta_union_base.hpp A be/src/thirdparty/datasketches/theta_union_base_impl.hpp M be/src/thirdparty/datasketches/theta_union_impl.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp M be/src/thirdparty/datasketches/u32_table.hpp M be/src/thirdparty/datasketches/u32_table_impl.hpp 66 files changed, 2,646 insertions(+), 1,873 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/17294/3 -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16656 ) Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions These functions can be used to get cardinality estimates of data using CPC algorithm from Apache DataSketches. ds_cpc_sketch() receives a dataset, e.g. a column from a table, and returns a serialized CPC sketch in string format. This can be written to a table or be fed directly to ds_cpc_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the CPC sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' CPC see: http://datasketches.apache.org/docs/CPC/CPC.html Figures-of-Merit Comparison of the HLL and CPC Sketches see: https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch_parquet.lineitem to compare perfomance with ndv(). Depending on data characteristics ndv() appears 2x-3x faster. CPC gives closer estimate than current ndv(). CPC is more accurate than HLL in some cases Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/cpc_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test M tests/query_test/test_datasketches.py 12 files changed, 398 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16656/6 -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17294 Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. IMPALA-10631: Upgrade DataSketches to version 3.0.0 Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version 3.0.0 tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 --- M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp M be/src/thirdparty/datasketches/AuxHashMap.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp M be/src/thirdparty/datasketches/CouponHashSet.hpp M be/src/thirdparty/datasketches/CouponList-internal.hpp M be/src/thirdparty/datasketches/CouponList.hpp M be/src/thirdparty/datasketches/CubicInterpolation.hpp M be/src/thirdparty/datasketches/HarmonicNumbers.hpp M be/src/thirdparty/datasketches/Hll4Array-internal.hpp M be/src/thirdparty/datasketches/Hll4Array.hpp M be/src/thirdparty/datasketches/Hll6Array-internal.hpp M be/src/thirdparty/datasketches/Hll6Array.hpp M be/src/thirdparty/datasketches/Hll8Array-internal.hpp M be/src/thirdparty/datasketches/Hll8Array.hpp M be/src/thirdparty/datasketches/HllArray-internal.hpp M be/src/thirdparty/datasketches/HllArray.hpp M be/src/thirdparty/datasketches/HllSketch-internal.hpp M be/src/thirdparty/datasketches/HllSketchImpl.hpp M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp M be/src/thirdparty/datasketches/HllUnion-internal.hpp M be/src/thirdparty/datasketches/HllUtil.hpp M be/src/thirdparty/datasketches/MurmurHash3.h M be/src/thirdparty/datasketches/README.md M be/src/thirdparty/datasketches/RelativeErrorTables.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp M be/src/thirdparty/datasketches/cpc_common.hpp M be/src/thirdparty/datasketches/cpc_compressor.hpp M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp M be/src/thirdparty/datasketches/cpc_sketch.hpp M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp M be/src/thirdparty/datasketches/cpc_union.hpp M be/src/thirdparty/datasketches/cpc_union_impl.hpp M be/src/thirdparty/datasketches/cpc_util.hpp M be/src/thirdparty/datasketches/hll.hpp M be/src/thirdparty/datasketches/icon_estimator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp M be/src/thirdparty/datasketches/kll_sketch.hpp M be/src/thirdparty/datasketches/kll_sketch_impl.hpp M be/src/thirdparty/datasketches/memory_operations.hpp M be/src/thirdparty/datasketches/theta_a_not_b.hpp M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp A be/src/thirdparty/datasketches/theta_comparators.hpp A be/src/thirdparty/datasketches/theta_constants.hpp A be/src/thirdparty/datasketches/theta_helpers.hpp M be/src/thirdparty/datasketches/theta_intersection.hpp A be/src/thirdparty/datasketches/theta_intersection_base.hpp A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp M be/src/thirdparty/datasketches/theta_intersection_impl.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp M be/src/thirdparty/datasketches/theta_sketch.hpp M be/src/thirdparty/datasketches/theta_sketch_impl.hpp M be/src/thirdparty/datasketches/theta_union.hpp A be/src/thirdparty/datasketches/theta_union_base.hpp A be/src/thirdparty/datasketches/theta_union_base_impl.hpp M be/src/thirdparty/datasketches/theta_union_impl.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp M be/src/thirdparty/datasketches/u32_table.hpp M be/src/thirdparty/datasketches/u32_table_impl.hpp 65 files changed, 2,640 insertions(+), 1,867 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/17294/2 -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16656 ) Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. Patch Set 5: DataSketches 3.0 has fixed this problem, need to wait for IMPALA-10631 to complete. Will be updated soon. -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 09 Apr 2021 02:39:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10632: Update the Theta sketch serialization interface
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17261 to look at the new patch set (#2). Change subject: IMPALA-10632: Update the Theta sketch serialization interface .. IMPALA-10632: Update the Theta sketch serialization interface DataSketches 3.0.0 removes the serialization of Update Theta sketch, and uses Compact Theta sketch to serialize for backward compatibility. tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I80470863097a4836ee07fe44babaef0c852f3051 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-functions-ir.cc 3 files changed, 48 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/17261/2 -- To view, visit http://gerrit.cloudera.org:8080/17261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I80470863097a4836ee07fe44babaef0c852f3051 Gerrit-Change-Number: 17261 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10632 Update the Theta sketch serialization interface
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17261 Change subject: IMPALA-10632 Update the Theta sketch serialization interface .. IMPALA-10632 Update the Theta sketch serialization interface DataSketches 3.0.0 removes the serialization of Update Theta sketch, and uses Compact Theta sketch to serialize for backward compatibility. tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I80470863097a4836ee07fe44babaef0c852f3051 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-functions-ir.cc 3 files changed, 48 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/17261/1 -- To view, visit http://gerrit.cloudera.org:8080/17261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I80470863097a4836ee07fe44babaef0c852f3051 Gerrit-Change-Number: 17261 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17186 ) Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc@194 PS4, Line 194: datasketches::compact_theta_sketch sketch = intersection_sketch.get_result(); > Please add more comment about the use cases when this could return false. a Done http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc@195 PS4, Line 195: riali > typo: theta Done http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc@223 PS4, Line 223: if (serialized_sketch.is_null || serialized_sketch.len == 0) return BigIntVal::null(); > This comment is not needed Done http://gerrit.cloudera.org:8080/#/c/17186/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17186/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@560 PS4, Line 560: 0 > I miss 2 tests here: Done -- To view, visit http://gerrit.cloudera.org:8080/17186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01 Gerrit-Change-Number: 17186 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 25 Mar 2021 15:19:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17186 ) Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function .. IMPALA-10581: Implement ds_theta_intersect_f() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the intersection of two sketches of same or different column and returns the resulting sketch of intersection. Example: select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I335eada00730036d5433775cfe673e0e4babaa01 --- M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 6 files changed, 157 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/17186/5 -- To view, visit http://gerrit.cloudera.org:8080/17186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01 Gerrit-Change-Number: 17186 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17186 ) Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function .. IMPALA-10581: Implement ds_theta_intersect_f() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the intersection of two sketches of same or different column and returns the resulting sketch of intersection. Example: select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I335eada00730036d5433775cfe673e0e4babaa01 --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 123 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/17186/4 -- To view, visit http://gerrit.cloudera.org:8080/17186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01 Gerrit-Change-Number: 17186 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17179 ) Change subject: IMPALA-10580: Implement ds_theta_union_f() function .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17179/4/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17179/4/be/src/exprs/datasketches-functions-ir.cc@163 PS4, Line 163: update_sketch_to_theta_unio > Sorry it was my bad that I had a typo in my suggestion, but this function n Done -- To view, visit http://gerrit.cloudera.org:8080/17179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa Gerrit-Change-Number: 17179 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 24 Mar 2021 09:48:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17179 ) Change subject: IMPALA-10580: Implement ds_theta_union_f() function .. IMPALA-10580: Implement ds_theta_union_f() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Union two sketches and returns the resulting sketch of union. Example: select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) | +---+ | 15| +---+ Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 114 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/6 -- To view, visit http://gerrit.cloudera.org:8080/17179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa Gerrit-Change-Number: 17179 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17179 ) Change subject: IMPALA-10580: Implement ds_theta_union_f() function .. Patch Set 4: (2 comments) The ds_theta_union() function has been implemented in IMPALA-10467 http://gerrit.cloudera.org:8080/#/c/17179/3/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17179/3/be/src/exprs/datasketches-functions-ir.cc@167 PS3, Line 167: if (!DeserializeDsSketch(serialized_sketch, _ptr)) { : LogSketchDeserializationError(ctx); : return false; : } : union_sketch.update(*sketch_ptr); : } : return true; : } > This part seems pretty similar to L175-182. Have you considered introducing Done http://gerrit.cloudera.org:8080/#/c/17179/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17179/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@436 PS3, Line 436: # Checks that ds_theta_union_f() returns an empty sketch for NULL inputs. > Shouldn't this return null for null inputs? Have you checked the behaviour ref: https://github.com/apache/datasketches-hive/blob/1.1.X-incubating/src/test/java/org/apache/datasketches/hive/theta/UnionSketchUDFTest.java#L36 -- To view, visit http://gerrit.cloudera.org:8080/17179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa Gerrit-Change-Number: 17179 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Mar 2021 14:00:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17179 ) Change subject: IMPALA-10580: Implement ds_theta_union_f() function .. IMPALA-10580: Implement ds_theta_union_f() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Union two sketches and returns the resulting sketch of union. Example: select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) | +---+ | 15| +---+ Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 114 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/4 -- To view, visit http://gerrit.cloudera.org:8080/17179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa Gerrit-Change-Number: 17179 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17186 Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function .. IMPALA-10581: Implement ds_theta_intersect_f() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the intersection of two sketches of same or different column and returns the resulting sketch of intersection. Example: select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I335eada00730036d5433775cfe673e0e4babaa01 --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 119 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/17186/2 -- To view, visit http://gerrit.cloudera.org:8080/17186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01 Gerrit-Change-Number: 17186 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17179 ) Change subject: IMPALA-10580: Implement ds_theta_union_f() function .. IMPALA-10580: Implement ds_theta_union_f() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Union two sketches and returns the resulting sketch of union. Example: select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) | +---+ | 15| +---+ Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 111 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/3 -- To view, visit http://gerrit.cloudera.org:8080/17179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa Gerrit-Change-Number: 17179 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 ) Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. Patch Set 5: (4 comments) Thanks for the review! Addressed the comments. http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc File be/src/exprs/datasketches-common.cc: http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc@54 PS4, Line 54: bool DeserializeDsSketch( > Could you please comment that this is a specialization of the template Dese Done http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc@148 PS4, Line 148: > nit: I don't think this comment and the one below adds much. The comment ab Done http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h File be/src/exprs/datasketches-functions.h: http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h@74 PS3, Line 74: sketches. If they ar > nit: "...sketches. If they are not..." Done http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@398 PS4, Line 398: ch. : create table ske > Does this mean that with this test A and B has no common items so the resul Done -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 17 Mar 2021 14:20:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17153 ) Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. IMPALA-10558: Implement ds_theta_exclude() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the a-not-b set operation given two sketches of same or different column. Example: select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 --- M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 5 files changed, 169 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/5 -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17153 ) Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. IMPALA-10558: Implement ds_theta_exclude() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the a-not-b set operation given two sketches of same or different column. Example: select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 --- M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 5 files changed, 166 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/4 -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17179 Change subject: IMPALA-10580: Implement ds_theta_union_f() function .. IMPALA-10580: Implement ds_theta_union_f() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Union two sketches and returns the resulting sketch of union. Example: select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) | +---+ | 15| +---+ Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 103 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/2 -- To view, visit http://gerrit.cloudera.org:8080/17179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa Gerrit-Change-Number: 17179 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 ) Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@128 PS2, Line 128: datasketches::theta_a_not_b a_not_b; > nit: this comment is not needed as doesn't give extra info Done http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@131 PS2, Line 131: if (!first_serialized_sketch.is_null && first_serialized_sketch.len > 0) { : if (!DeserializeDsSketch(first_serialized_sketch, _sketch_ptr)) { : LogSketchDeserializationError(ctx); : return StringVal::null(); : } : } : datasketches::theta_sketch::unique_ptr second_sketch_ptr; : if (!second_serialized_sketch.is_null && second_serialized_sketch.len > 0) { : if (!DeserializeDsSketch(second_serialized_sketch, _sketch_ptr)) { : > This part seems pretty identical to the section L141-150. Can you move it t Done http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@155 PS2, Line 155: d::stringstream serialized_input > I'm not sure I understand the condition in this format :) Could you please function ref: https://en.cppreference.com/w/cpp/memory/unique_ptr/operator_bool, usage has been modified with reference to the example. http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h File be/src/exprs/datasketches-functions.h: http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h@73 PS2, Line 73: 'first_serialized_s > Could you mention both sketch params? Done http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@330 PS2, Line 330: When A is empty and B is > When A is empty and B is null. Done http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@331 PS2, Line 331: select ds_theta_estimate(ds_theta_exclude(ds_theta_sketch(f2), null)) > Could you please add another test where A is null and B is empty? (the oppo Done http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@332 PS2, Line 332: from functional_parquet.emptytable; > Another test would be where A and B are both empty. Done http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@379 PS2, Line 379: i.ti i_ti, i.i i_i, i.bi i_bi, i.f i_f, i.d i_d, i.s i_s, i.c i_c, i.v i_v,i.nc i_nc, > I miss a test where the result of an a-not-b is a non-empty sketch (where t Done -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 14 Mar 2021 14:30:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17153 ) Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. IMPALA-10558: Implement ds_theta_exclude() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the a-not-b set operation given two sketches of same or different column. Example: select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 --- M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 5 files changed, 167 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/3 -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17153 Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. IMPALA-10558: Implement ds_theta_exclude() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the a-not-b set operation given two sketches of same or different column. Example: select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 125 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/2 -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG@14 PS3, Line 14: > nit: not needed Done http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@271 PS3, Line 271: stimation, which is consistent : # with direct estimation of these sketches. > Could you add tests that cover the second part of this sentence so that we Done -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 04 Mar 2021 03:24:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. IMPALA-10520: Implement ds_theta_intersect() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and intersects them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and intersect them to get estimates based on the partitions the user is interested in related sketches. E.g.: SELECT ds_theta_estimate(ds_theta_intersect(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_intersect() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_intersect() on those sketches Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 182 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/4 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. IMPALA-10520: Implement ds_theta_intersect() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and intersects them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and intersect them to get an estimates based on the partitions the user is interested in related sketches. E.g.: SELECT ds_theta_estimate(ds_theta_intersect(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_intersect() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_intersect() on those sketches Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 163 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/3 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@271 PS2, Line 271: static void DsThetaIntersectUpdate( > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@273 PS2, Line 273: static StringVal DsThetaIntersectSerialize(FunctionContext*, const StringVal& src); > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Feb 2021 11:28:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17088 Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. IMPALA-10520: Implement ds_theta_intersect() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and intersects them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and intersect them to get an estimates based on the partitions the user is interested in related sketches. E.g.: SELECT ds_theta_estimate(ds_theta_intersect(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_intersect() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_intersect() on those sketches Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 161 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/2 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Fucun Chu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. IMPALA-10467: Implement ds_theta_union() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_theta_estimate(ds_theta_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_union() on those sketches Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 7 files changed, 152 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/2 -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@28 PS2, Line 28:data, the difference is around 1%-10%. ds_hll_estimate() is faster > Did you forgot to add this additional section to the commit msg? Done http://gerrit.cloudera.org:8080/#/c/17008/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17008/3/be/src/exprs/aggregate-functions-ir.cc@1905 PS3, Line 1905: if (dst->len == sizeof(datasketches::theta_union)) { > There is one more thing I don't understand here: 1. theta_union.get_result() returns a compact sketch (compact_theta_sketch), does not support updating, and is inconsistent with the initial underlying type of dst (update_theta_sketch). This is different from the HLL sketch. 2. Based on the previous question, use theta_union as the underlying type of dst. Relevant comments have been added to the code http://gerrit.cloudera.org:8080/#/c/17008/3/be/src/exprs/aggregate-functions-ir.cc@1908 PS3, Line 1908: } else if (dst->len == sizeof(datasketches::update_theta_sketch)) { > A DCHECK would be nice in the else branch to verify that dst->len is sizeof Done -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 16 Feb 2021 08:07:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions .. IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions These functions can be used to get cardinality estimates of data using Theta algorithm from Apache DataSketches. ds_theta_sketch() receives a dataset, e.g. a column from a table, and returns a serialized Theta sketch in string format. This can be written to a table or be fed directly to ds_theta_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the Theta sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' Theta see: https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch25_parquet.lineitem to compare perfomance with ds_hll_*. ds_theta_* is faster than ds_hll_* on the original data, the difference is around 1%-10%. ds_hll_estimate() is faster than ds_theta_estimate() on existing sketch. HLL and Theta gives closer estimate except for string. see IMPALA-10464. Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions-test.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 11 files changed, 447 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/4 -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16656 ) Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. Patch Set 4: Performance comparison between ds_hll_* and ds_cpc_* functions see: https://issues.apache.org/jira/browse/IMPALA-10500 -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Feb 2021 06:58:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16656 ) Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. Patch Set 4: The test is being processed, update the document after completion -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 14:07:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@7 PS2, Line 7: ds_theta_estimate > nit: typo Done http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@13 PS2, Line 13: ds_theta_estimate > nit: same typo Done http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@28 PS2, Line 28:see IMPALA-10464. > I'd also include some highlights from that perf measurement doc into the co Done http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1646 PS2, Line 1646: SerializeDsThetaSketch( > In contrast with HLL as I see Theta doesn't compact the sketch just seriali Done http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1899 PS2, Line 1899: or dst->len == sizeof(datasketches::theta_union)); > I;m a bit lost here. Could you help me understand why is it needed to conve Previously, it was processed along the idea that the size of dst is unchanged, and it is better to return union_sketch. http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc@110 PS2, Line 110: return 0; > HLL returns a null here. Have you checked the behaviour in Hive to be in sy Comparing the test cases of HLL and Theta, the results are different. Theta: https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/theta/EstimateSketchUDFTest.java#L34 HLL: https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/hll/SketchToEstimateUDFTest.java#L31 http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@138 PS2, Line 138: # Check that ds_theta_estimate returns error for strings that are not serialized sketches. > Please add a test when ds_theta_estimate() is used on an HLL sketch. I gues Done -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 13:38:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions .. IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions These functions can be used to get cardinality estimates of data using Theta algorithm from Apache DataSketches. ds_theta_sketch() receives a dataset, e.g. a column from a table, and returns a serialized Theta sketch in string format. This can be written to a table or be fed directly to ds_theta_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the Theta sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' Theta see: https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch25_parquet.lineitem to compare perfomance with ds_hll_*. HLL and Theta gives closer estimate except for string, see IMPALA-10464. Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions-test.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 11 files changed, 445 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/3 -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17048 Change subject: IMPALA-10467: Implement ds_theta_union() function .. IMPALA-10467: Implement ds_theta_union() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_theta_estimate(ds_theta_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_union() on those sketches Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 7 files changed, 162 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/1 -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theat estimate() functions
Fucun Chu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions .. IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions These functions can be used to get cardinality estimates of data using Theta algorithm from Apache DataSketches. ds_theta_sketch() receives a dataset, e.g. a column from a table, and returns a serialized Theta sketch in string format. This can be written to a table or be fed directly to ds_theat_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the Theta sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' Theta see: https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch25_parquet.lineitem to compare perfomance with ds_hll_*. HLL and Theta gives closer estimate except for string, see IMPALA-10464. Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions-test.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 11 files changed, 399 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/2 -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theat estimate() functions
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17008 Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions .. IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions These functions can be used to get cardinality estimates of data using Theta algorithm from Apache DataSketches. ds_theta_sketch() receives a dataset, e.g. a column from a table, and returns a serialized Theta sketch in string format. This can be written to a table or be fed directly to ds_theat_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the Theta sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' Theta see: https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch25_parquet.lineitem to compare perfomance with ds_hll_*. HLL and Theta gives closer estimate except for string, see IMPALA-10464. Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions-test.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 11 files changed, 401 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/1 -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins