[Impala-ASF-CR] IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading

2022-07-12 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/18028 )

Change subject: IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading
..

IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading

This patch fixes the data loading problem of integrating Apache Hive 3
and switches to the tez engine.

Add HIVE-21569, HIVE-20038 patches and recompile the hive-exec module.

Testing:
- Manually perform data loading steps.

Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1
---
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/resources/hive-site.xml.py
M testdata/bin/generate-schema-statements.py
M testdata/bin/load_nested.py
M testdata/bin/patch_hive.sh
M testdata/cluster/hive/README
A testdata/cluster/hive/patch1-HIVE-21569.diff
A testdata/cluster/hive/patch2-HIVE-20038.diff
M tests/util/test_file_parser.py
11 files changed, 381 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/18028/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1
Gerrit-Change-Number: 18028
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-11422: Bump Apache Hive to 3.1.3

2022-07-12 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18716


Change subject: IMPALA-11422: Bump Apache Hive to 3.1.3
..

IMPALA-11422: Bump Apache Hive to 3.1.3

This patch bump up version of Apache Hive to 3.1.3.

Change-Id: I395a99cccf0a8902b3fd47235c32b69c5494f291
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/18716/1
--
To view, visit http://gerrit.cloudera.org:8080/18716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I395a99cccf0a8902b3fd47235c32b69c5494f291
Gerrit-Change-Number: 18716
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading

2022-05-02 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18028 )

Change subject: IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18028/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18028/5//COMMIT_MSG@15
PS5, Line 15: - Manually perform data loading steps.
> We can skip tests that depends on this.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1
Gerrit-Change-Number: 18028
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 02 May 2022 12:49:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading

2022-05-02 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/18028 )

Change subject: IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading
..

IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading

This patch fixes the data loading problem of integrating Apache Hive 3
and switches to the tez engine.

Add HIVE-21569, HIVE-20038 patches and recompile the hive-exec module.

Testing:
- Manually perform data loading steps.

Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1
---
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/resources/hive-site.xml.py
M testdata/bin/generate-schema-statements.py
M testdata/bin/load_nested.py
M testdata/bin/patch_hive.sh
M testdata/cluster/hive/README
A testdata/cluster/hive/patch1-HIVE-21569.diff
A testdata/cluster/hive/patch2-HIVE-20038.diff
M tests/util/test_file_parser.py
11 files changed, 381 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/18028/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1
Gerrit-Change-Number: 18028
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2022-02-18 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in cdp-hive-3 vs
apache-hive-3 and are used by front end code. At the build time, based
on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims
is added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/jenkins/build-all-flag-combinations.sh
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
R fe/src/main/java/org/apache/impala/util/HiveMetadataFormatUtils.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
27 files changed, 2,514 insertions(+), 1,030 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/16
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 16
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2022-02-16 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in cdp-hive-3 vs
apache-hive-3 and are used by front end code. At the build time, based
on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims
is added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/jenkins/build-all-flag-combinations.sh
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
27 files changed, 3,198 insertions(+), 1,026 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/15
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 15
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2022-02-15 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in cdp-hive-3 vs
apache-hive-3 and are used by front end code. At the build time, based
on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims
is added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/jenkins/build-all-flag-combinations.sh
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
27 files changed, 3,198 insertions(+), 1,026 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/14
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 14
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2022-02-09 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
24 files changed, 3,011 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/13
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 13
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2022-02-08 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
25 files changed, 2,998 insertions(+), 261 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/12
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 12
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support

2021-11-29 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17503 )

Change subject: IMPALA-10771: Add Tencent COS support
..


Patch Set 5:

The hadoop-cos project has added a license and follows the MIT license. 
https://github.com/tencentyun/hadoop-cos/issues/35


--
To view, visit http://gerrit.cloudera.org:8080/17503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Gerrit-Change-Number: 17503
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 29 Nov 2021 11:00:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support

2021-11-29 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17503 )

Change subject: IMPALA-10771: Add Tencent COS support
..

IMPALA-10771: Add Tencent COS support

This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.

New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.

Follow-up:
- Support for caching COS file handles will be addressed in
   IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   COS (IMPALA-10773).

Tests:
 - Upload hdfs test data to a COS bucket. Modify all locations in HMS
   DB to point to the COS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_metastore_service.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
60 files changed, 275 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17503/5
--
To view, visit http://gerrit.cloudera.org:8080/17503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Gerrit-Change-Number: 17503
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-26 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
25 files changed, 2,993 insertions(+), 256 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/11
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 11
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-25 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
22 files changed, 2,765 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/10
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 10
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-25 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17774/9/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java:

http://gerrit.cloudera.org:8080/#/c/17774/9/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java@20
PS9, Line 20:
> can you add a class level comment to describe what goes into this file? Loo
Done


http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/bin/patch_hive.sh
File testdata/bin/patch_hive.sh:

http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/bin/patch_hive.sh@20
PS9, Line 20: # This script is used to repair service startup and task running 
problems that occu
> not sure I understand. Can you be more specific on the purpose of this file
Done


http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/cluster/hive/README
File testdata/cluster/hive/README:

http://gerrit.cloudera.org:8080/#/c/17774/9/testdata/cluster/hive/README@5
PS9, Line 5: Contains only patches for `hive_metastore.thrift` is used to solve 
the problem that the
   : generated cpp file cannot be compiled.
> do you mean that Hive 3.1's thrift file cannot generate compilable cpp code
https://issues.apache.org/jira/browse/HIVE-21586 has solved this problem in the 
3.2 and 4.0 branches. This patch is applied here as a transition solution.



--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 10
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 25 Nov 2021 14:06:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading

2021-11-21 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18028


Change subject: WIP IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset 
loading
..

WIP IMPALA-10871 (part 2): Apache Hive 3: fixes for dataset loading

This patch fixes the data loading problem of integrating Apache Hive 3
and switches to the tez engine.

Add HIVE-21569, HIVE-20038 patches and recompile the hive-exec module.

Todos:
- The number of tpch_nested_parquet.customer files is inconsistent with
 that generated by cdp
- Need more testing

Testing:
- Manually perform data loading steps.

Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1
---
M buildall.sh
M fe/src/test/resources/hive-site.xml.py
M testdata/bin/generate-schema-statements.py
M testdata/bin/load_nested.py
M testdata/cluster/hive/README
A testdata/cluster/hive/patch1-HIVE-21569.diff
A testdata/cluster/hive/patch2-HIVE-20038.diff
M tests/util/test_file_parser.py
8 files changed, 346 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/18028/5
--
To view, visit http://gerrit.cloudera.org:8080/18028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I86a1fdffc70b8d9a3bc97a72b5b939021dc496f1
Gerrit-Change-Number: 18028
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-21 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M bin/rat_exclude_files.txt
M buildall.sh
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
A testdata/bin/patch_hive.sh
A testdata/cluster/hive/README
A testdata/cluster/hive/patch0-HIVE-21586.diff
22 files changed, 2,755 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/9
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 9
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA 10954: Make create, drop methods of kudu catalog service public

2021-11-17 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17981 )

Change subject: IMPALA 10954: Make create, drop methods of kudu catalog service 
public
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17981/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17981/3//COMMIT_MSG@7
PS3, Line 7: IMPALA 10954
IMPALA-10954



--
To view, visit http://gerrit.cloudera.org:8080/17981
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib60142424d8e758031de596831d98aed69d488ef
Gerrit-Change-Number: 17981
Gerrit-PatchSet: 3
Gerrit-Owner: Deepti Sehrawat 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 17 Nov 2021 14:50:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7942: Add query hints for cardinalities and selectivities

2021-11-15 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18023/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18023/1//COMMIT_MSG@21
PS1, Line 21: hint value only valid when table does not have stats or stats is 
corrupt.
nit: line should have 72 or fewer characters



--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 15 Nov 2021 14:34:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-15 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support since HMS-3 use Hive 4 APIs. This will be on-going
effort and test failures on ASF-Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster.
This build has the fixes for HIVE-20038, HIVE-22717. This hack will be
added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
M testdata/bin/generate-schema-statements.py
M testdata/bin/load_nested.py
M tests/util/test_file_parser.py
20 files changed, 2,522 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/8
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

2021-11-13 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
..

IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala and
writes a data source for querying jdbc. It has some limitations:
- It is not distributed.
- Only support binary predicates with operators =, !=, <=, >=, <, > to
be pushed to RDBMS

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M 
fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A 
java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
28 files changed, 2,003 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/7
--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-11 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and Apache-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against Apache-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support since HMS-3 use Hive 4 APIs. This will be on-going
effort and test failures on Apache-Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster.
This build has the fixes for HIVE-20038, HIVE-22717. This hack will be
added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
M testdata/bin/generate-schema-statements.py
M tests/util/test_file_parser.py
19 files changed, 2,203 insertions(+), 118 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/7
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-11 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17774 )

Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and Apache-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against Apache-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support since HMS-3 use Hive 4 APIs. This will be on-going
effort and test failures on Apache-Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster.
This build has the fixes for HIVE-20038, HIVE-22717. This hack will be
added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
M testdata/bin/generate-schema-statements.py
M tests/util/test_file_parser.py
19 files changed, 1,500 insertions(+), 118 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/6
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

2021-11-11 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17774


Change subject: IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
..

IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2

Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in hive-3 vs apache-hive-3
and are used by front end code. At the build time, based on the
environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is
added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and Apache-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against Apache-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support since HMS-3 use Hive 4 APIs. This will be on-going
effort and test failures on Apache-Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster.
This build has the fixes for HIVE-20038, HIVE-22717. This hack will be
added to the build in subsequent tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
---
M fe/pom.xml
A 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
A fe/src/main/java/org/apache/impala/catalog/metastore/CatalogHmsUtils.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M java/pom.xml
M testdata/bin/generate-schema-statements.py
M tests/util/test_file_parser.py
19 files changed, 1,487 insertions(+), 104 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17774/4
--
To view, visit http://gerrit.cloudera.org:8080/17774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Gerrit-Change-Number: 17774
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables

2021-11-03 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables
..

WIP IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala and
writes a data source for querying jdbc. It has some limitations:
- It is not distributed.
- Only support binary predicates with operators =, !=, <=, >=, <, > to
be pushed to RDBMS

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M 
fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A 
java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
28 files changed, 1,977 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/6
--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables

2021-11-02 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17842


Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables
..

WIP IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala and
writes a data source for querying jdbc. It has some limitations:
- It is not distributed.
- Only support binary predicates with operators =, !=, <=, >=, <, > to
be pushed to RDBMS

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M 
fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A 
java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
28 files changed, 1,975 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/5
--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] Impala-10994: Normalize pip package name

2021-11-01 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17987 )

Change subject: Impala-10994: Normalize pip package name
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17987/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17987/4//COMMIT_MSG@7
PS4, Line 7: Impala-10994
The ticket address must be uppercase, IMPALA-10994.


http://gerrit.cloudera.org:8080/#/c/17987/4//COMMIT_MSG@8
PS4, Line 8:
Please add a message that is exactly long enough to explain what the problem 
was, and how it was fixed. Each should have 72 or fewer characters if possible.
see: https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala



--
To view, visit http://gerrit.cloudera.org:8080/17987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I479df0ad7acf3c650b8f5317372261d5e2840864
Gerrit-Change-Number: 17987
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 01 Nov 2021 11:13:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10997: Refactor Java Hive UDF code.

2021-11-01 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17986 )

Change subject: IMPALA-10997: Refactor Java Hive UDF code.
..


Patch Set 1:

(2 comments)

This looks good, I only had some minor comments.

http://gerrit.cloudera.org:8080/#/c/17986/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17986/1//COMMIT_MSG@19
PS1, Line 19: HiveUdfExecutor: Abstract base class that contains code that is 
common to
: the legacy UDF.class and the GenericUDF.class when it is 
eventually created.
: HiveUdfExecutorLegacy: Implementation of the code that is 
UDF.class specific.
nit: each line should have 72 or fewer characters if possible.


http://gerrit.cloudera.org:8080/#/c/17986/1/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
File fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17986/1/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@105
PS1, Line 105:   classLoaderClosed_ = true;
Why not use classLoader_ = null?



--
To view, visit http://gerrit.cloudera.org:8080/17986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1b981aed3021aef08c87e7cdbf7c6af95906754
Gerrit-Change-Number: 17986
Gerrit-PatchSet: 1
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 01 Nov 2021 08:21:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] datasketches: improved merge and memory allocation - avoid overhead of constructing union and getting result from it every time - call destructors of sketch and union objects

2021-10-07 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: datasketches: improved merge and memory allocation - avoid 
overhead of constructing union and getting result from it every time - call 
destructors of sketch and union objects
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17869/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17869/1//COMMIT_MSG@7
PS1, Line 7: datasketches: improved merge and memory allocation
First, you need to create a jira ticket for this patch at 
https://issues.apache.org/jira/browse/IMPALA
Second, please write a good, clear commit message, with a short, descriptive 
title and a message that is exactly long enough to explain what the problem 
was, and how it was fixed. Each should have 72 or fewer characters if possible.
The first line should have an empty line after it, and the first line should 
begin with the ticket(s) addressed, followed by a colon and a space, eg: 
"IMPALA-1234: ".



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 1
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Oct 2021 10:19:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-09-27 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@9
PS1, Line 9: To enable fine-grained table refreshing, there are three main 
changes in this commit.
nit: each line should have 72 or fewer characters if possible.



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Mon, 27 Sep 2021 09:13:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

2021-09-26 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a 
precision
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1722
PS1, Line 1722: nctionCon
> precision is not the best name. I would suggest following the datasketches
Done


http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1826
PS1, Line 1826: _cast why max here, not the specified precision?
The resulting accuracy of a sketch returned at the end of the unioning process 
will be a function of the smallest of lg_max_k and lg_config_k 
that the union operator has seen.
see: 
https://github.com/apache/datasketches-cpp/blob/master/hll/include/hll.hpp#L404-L407

In order not to affect the union operation of the high-precision ds_hll_sketch 
result sketch, lg_max_k takes the maximum value. If necessary, precision 
parameters will be added to ds_hll_union in the new jira



--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sun, 26 Sep 2021 10:42:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

2021-09-26 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a 
precision
..

IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision

This path addresses the current limitation in DS_HLL_SKETCH function by
extending the function to optionally take a secondary argument
called precision.

   DS_HLL_SKETCH(expression [, precision])

The precision value must be between 4 and 21, specified as an integer
literal. The default is 12.

Here are test results of a typical workload in tpch25.lineitem (#1):
++
|   Metric| Count Distinct | DS_HLL-12  | DS_HLL-16  | DS_HLL-21 |
++
|  Memory(MB) | 725.43 |   124.87   |123.19  |121.85 |
| Duration(s) |  5.64  |   1.03 |1.13| 1.64  |
|  ErrorRate  |   0%   |   1.26%|0.22%   | 0.05% |
++

Testing:
1. Ran unit tests against table lineitem in TPC-DS in both serial and
   parallel plan settings;
2. Ran "core" tests.

Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
6 files changed, 155 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/2
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-09-12 Thread Fucun Chu (Code Review)
Hello Quanlong Huang, Laszlo Gaal, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17793

to look at the new patch set (#8).

Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..

IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

This patch modifies the minicluster script to optionally use Apache
Hive 3.1.2 instead of CDP Hive 3.1.3.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_APACHE_HIVE is set to true the
bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and
extracts it in the toolchain directory. These binaries are used to
start the Hive services (Hiveserver2 and metastore). The default is
CDP Hive 3.1.3

Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses
a different database name so that it is easy to switch from working
from one environment which uses CDP Hive 3.1.3 metastore to another
which usese Apache Hive 3.1.2 metastore.

In order to start a minicluster which uses Apache Hive 3.1.2 users
should follow the steps below:

1. Make sure that minicluster, if running, is stopped before you run
the following commands.
2. Open a new terminal and run following commands.
> export USE_APACHE_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Apache Hive 3.1.2 tarballs and
extracts them in toolchain/apache_components directory.

> rm $HIVE_HOME/lib/guava-*jar
> cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/
  The above command is to fix HIVE-22915

> bin/create-test-configuration.sh -create_metastore
  The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Apache Hive 3.1.2 schema
is initialized.

> testdata/bin/run-all.sh

Follow-up:
 - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871

Tests:
 - Made sure that the cluster comes up with Apache Hive 3.1.2 when the
   steps above are performed.
 - Made sure that existing scripts work as they do currently when
   argument is not provided.

Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
3 files changed, 89 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/8
--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-09-12 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17793 )

Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17793/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17793/7//COMMIT_MSG@43
PS7, Line 43: so that a new metastore db is created and the Apache Hive 3.1.2 
schema
> nit: wrap at 72
Done



--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Sep 2021 02:29:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches

2021-08-31 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17818 )

Change subject: IMPALA-10901 cleaner and faster operations with datasketches
..


Patch Set 2:

The following EE Tests files need to be modified:
testdata/workloads/functional-query/queries/QueryTest/
datasketches-cpc.test
datasketches-hll.test
datasketches-kll.test
datasketches-theta.test
"UDF ERROR: Unable to deserialize sketch" needs to add e.what() information.

Run the above test case file using the following command:
cd tests
impala-py.test query_test/test_datasketches.py

Or use pre-review-test 
(https://jenkins.impala.io/job/pre-review-test/build?delay=0sec) to run the test


--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 2
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 01 Sep 2021 03:05:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-08-30 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17793 )

Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..

IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

This patch modifies the minicluster script to optionally use Apache
Hive 3.1.2 instead of CDP Hive 3.1.3.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_APACHE_HIVE is set to true the
bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and
extracts it in the toolchain directory. These binaries are used to
start the Hive services (Hiveserver2 and metastore). The default is
CDP Hive 3.1.3

Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses
a different database name so that it is easy to switch from working
from one environment which uses CDP Hive 3.1.3 metastore to another
which usese Apache Hive 3.1.2 metastore.

In order to start a minicluster which uses Apache Hive 3.1.2 users
should follow the steps below:

1. Make sure that minicluster, if running, is stopped before you run
the following commands.
2. Open a new terminal and run following commands.
> export USE_APACHE_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Apache Hive 3.1.2 tarballs and
extracts them in toolchain/apache_components directory.

> rm $HIVE_HOME/lib/guava-*jar
> cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/
  The above command is to fix HIVE-22915

> bin/create-test-configuration.sh -create_metastore
  The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Apache Hive 3.1.2 schema is
initialized.

> testdata/bin/run-all.sh

Follow-up:
 - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871

Tests:
 - Made sure that the cluster comes up with Apache Hive 3.1.2 when the
   steps above are performed.
 - Made sure that existing scripts work as they do currently when
   argument is not provided.

Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
3 files changed, 89 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/7
--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-08-25 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17793 )

Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..

IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

This patch modifies the minicluster script to optionally use Apache
Hive 3.1.2 instead of CDP Hive 3.1.3.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_APACHE_HIVE is set to true the
bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and
extracts it in the toolchain directory. These binaries are used to
start the Hive services (Hiveserver2 and metastore). The default is
CDP Hive 3.1.3

Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses
a different database name so that it is easy to switch from working
from one environment which uses CDP Hive 3.1.3 metastore to another
which usese Apache Hive 3.1.2 metastore.

In order to start a minicluster which uses Apache Hive 3.1.2 users
should follow the steps below:

1. Make sure that minicluster, if running, is stopped before you run
the following commands.
2. Open a new terminal and run following commands.
> export USE_APACHE_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Apache Hive 3.1.2 tarballs and
extracts them in toolchain/apache_components directory.

> rm $HIVE_HOME/lib/guava-*jar
> cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/
  The above command is to fix HIVE-22915

> bin/create-test-configuration.sh -create_metastore
  The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Apache Hive 3.1.2 schema is
initialized.

> testdata/bin/run-all.sh

Follow-up:
 - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871

Tests:
 - Made sure that the cluster comes up with Apache Hive 3.1.2 when the
   steps above are performed.
 - Made sure that existing scripts work as they do currently when
   argument is not provided.

Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
3 files changed, 89 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/6
--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-08-25 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17793 )

Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..

IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

This patch modifies the minicluster script to optionally use Apache
Hive 3.1.2 instead of CDP Hive 3.1.3.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_APACHE_HIVE is set to true the
bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and
extracts it in the toolchain directory. These binaries are used to
start the Hive services (Hiveserver2 and metastore). The default is
CDP Hive 3.1.3

Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses
a different database name so that it is easy to switch from working
from one environment which uses CDP Hive 3.1.3 metastore to another
which usese Apache Hive 3.1.2 metastore.

In order to start a minicluster which uses Apache Hive 3.1.2 users
should follow the steps below:

1. Make sure that minicluster, if running, is stopped before you run
the following commands.
2. Open a new terminal and run following commands.
> export USE_APACHE_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Apache Hive 3.1.2 tarballs and
extracts them in toolchain/apache_components directory.

> rm $HIVE_HOME/lib/guava-*jar
> cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/
  The above command is to fix HIVE-22915

> bin/create-test-configuration.sh -create_metastore
  The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Apache Hive 3.1.2 schema is
initialized.

> testdata/bin/run-all.sh

Follow-up:
 - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871

Tests:
 - Made sure that the cluster comes up with Apache Hive 3.1.2 when the
   steps above are performed.
 - Made sure that existing scripts work as they do currently when
   argument is not provided.

Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
3 files changed, 86 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/5
--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-08-20 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17793 )

Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..

IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

This patch modifies the minicluster script to optionally use Apache
Hive 3.1.2 instead of CDP Hive 3.1.3.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_APACHE_HIVE is set to true the
bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and
extracts it in the toolchain directory. These binaries are used to
start the Hive services (Hiveserver2 and metastore). The default is
CDP Hive 3.1.3

Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses
a different database name so that it is easy to switch from working
from one environment which uses CDP Hive 3.1.3 metastore to another
which usese Apache Hive 3.1.2 metastore.

In order to start a minicluster which uses Apache Hive 3.1.2 users
should follow the steps below:

1. Make sure that minicluster, if running, is stopped before you run
the following commands.
2. Open a new terminal and run following commands.
> export USE_APACHE_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Apache Hive 3.1.2 tarballs and
extracts them in toolchain/apache_components directory.

> rm $HIVE_HOME/lib/guava-*jar
> cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/
  The above command is to fix HIVE-22915

> bin/create-test-configuration.sh -create_metastore
  The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Apache Hive 3.1.2 schema is
initialized.

> testdata/bin/run-all.sh

Follow-up:
 - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871

Tests:
 - Made sure that the cluster comes up with Apache Hive 3.1.2 when the
   steps above are performed.
 - Made sure that existing scripts work as they do currently when
   argument is not provided.

Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
3 files changed, 84 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/4
--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-08-20 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17793 )

Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17793/3/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17793/3/bin/bootstrap_toolchain.py@324
PS3, Line 324: class ApacheComponent(EnvVersionedPackage):
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17793/3/bin/bootstrap_toolchain.py@326
PS3, Line 326:
> flake8: E251 unexpected spaces around keyword / parameter equals
Done


http://gerrit.cloudera.org:8080/#/c/17793/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/17793/3/bin/impala-config.sh@261
PS3, Line 261:   # When USE_APACHE_HIVE is set we use the apache hive version 
to build as well as deploy in
> line too long (92 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 21 Aug 2021 04:22:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

2021-08-20 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17793


Change subject: IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster
..

IMPALA-10870: Add Apache Hive 3.1.2 to the minicluster

This patch modifies the minicluster script to optionally use Apache
Hive 3.1.2 instead of CDP Hive 3.1.3.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_APACHE_HIVE is set to true the
bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and
extracts it in the toolchain directory. These binaries are used to
start the Hive services (Hiveserver2 and metastore). The default is
CDP Hive 3.1.3

Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses
a different database name so that it is easy to switch from working
from one environment which uses CDP Hive 3.1.3 metastore to another
which usese Apache Hive 3.1.2 metastore.

In order to start a minicluster which uses Apache Hive 3.1.2 users
should follow the steps below:

1. Make sure that minicluster, if running, is stopped before you run
the following commands.
2. Open a new terminal and run following commands.
> export USE_APACHE_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Apache Hive 3.1.2 tarballs and
extracts them in toolchain/apache_components directory.

> rm $HIVE_HOME/lib/guava-*jar
> cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/
  The above command is to fix HIVE-22915

> bin/create-test-configuration.sh -create_metastore
  The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Apache Hive 3.1.2 schema is
initialized.

> testdata/bin/run-all.sh

Follow-up:
 - Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871

Tests:
 - Made sure that the cluster comes up with Apache Hive 3.1.2 when the
   steps above are performed.
 - Made sure that existing scripts work as they do currently when
   argument is not provided.

Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
3 files changed, 81 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17793/3
--
To view, visit http://gerrit.cloudera.org:8080/17793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Gerrit-Change-Number: 17793
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

2021-07-31 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17744


Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a 
precision
..

IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision

This path addresses the current limitation in DS_HLL_SKETCH function by
extending the function to optionally take a secondary argument
called precision.

   DS_HLL_SKETCH(expression [, precision])

The precision value must be between 4 and 21, specified as an integer
literal. The default is 12.

Here are test results of a typical workload in tpch25.lineitem (#1):
++
|   Metric| Count Distinct | DS_HLL-12  | DS_HLL-16  | DS_HLL-21 |
++
|  Memory(MB) | 725.43 |   124.87   |123.19  |121.85 |
| Duration(s) |  5.64  |   1.03 |1.13| 1.64  |
|  ErrorRate  |   0%   |   1.26%|0.22%   | 0.05% |
++

Testing:
1. Ran unit tests against table lineitem in TPC-DS in both serial and
   parallel plan settings;
2. Ran "core" tests.

Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
6 files changed, 155 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/1
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI

2021-07-27 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17726 )

Change subject: IMPALA-10779: Print the username closing a session or 
cancelling a query from the WebUI
..


Patch Set 5:

(2 comments)

Thanks for the review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
File fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java:

http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java@103
PS4, Line 103: execQuery
> nit: rename it to 'execQueryAsync' since we won't wait for results here.
Done


http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
File fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java:

http://gerrit.cloudera.org:8080/#/c/17726/4/fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java@338
PS4, Line 338: // Wait for logs to flush
> Shouldn't we do this after sleep?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
Gerrit-Change-Number: 17726
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 28 Jul 2021 05:25:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI

2021-07-27 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17726 )

Change subject: IMPALA-10779: Print the username closing a session or 
cancelling a query from the WebUI
..

IMPALA-10779: Print the username closing a session or cancelling a query from 
the WebUI

This patch appends the username of the client who made the request to
close a session or cancel a query from the coordinator's debug WebUI.

Tests:
- Added a new fe test for LDAP auth to verify that the new status gets
  printed in runtime profile and coordinator log when a query is
  cancelled in this way.

Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
---
M be/src/kudu/util/web_callback_registry.h
M be/src/service/impala-http-handler.cc
M be/src/util/webserver.cc
M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
M tests/webserver/test_web_pages.py
6 files changed, 96 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/5
--
To view, visit http://gerrit.cloudera.org:8080/17726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
Gerrit-Change-Number: 17726
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI

2021-07-27 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17726 )

Change subject: IMPALA-10779: Print the username closing a session or 
cancelling a query from the WebUI
..

IMPALA-10779: Print the username closing a session or cancelling a query from 
the WebUI

This patch appends the username of the client who made the request to
close a session or cancel a query from the coordinator's debug WebUI.

Tests:
- Added a new fe test for LDAP auth to verify that the new status gets
  printed in runtime profile and coordinator log when a query is
  cancelled in this way.

Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
---
M be/src/kudu/util/web_callback_registry.h
M be/src/service/impala-http-handler.cc
M be/src/util/webserver.cc
M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
M tests/webserver/test_web_pages.py
6 files changed, 96 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/4
--
To view, visit http://gerrit.cloudera.org:8080/17726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
Gerrit-Change-Number: 17726
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI

2021-07-27 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17726 )

Change subject: IMPALA-10779: Print the username closing a session or 
cancelling a query from the WebUI
..


Patch Set 3:

Added a new test for LDAP auth in LdapWebserverTest.java


--
To view, visit http://gerrit.cloudera.org:8080/17726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
Gerrit-Change-Number: 17726
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 28 Jul 2021 01:19:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI

2021-07-27 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17726 )

Change subject: IMPALA-10779: Print the username closing a session or 
cancelling a query from the WebUI
..

IMPALA-10779: Print the username closing a session or cancelling a query from 
the WebUI

This patch appends the username of the client who made the request to
close a session or cancel a query from the coordinator's debug WebUI.

Tests:
- Added a new fe test for LDAP auth to verify that the new status gets
  printed in runtime profile and coordinator log when a query is
  cancelled in this way.

Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
---
M be/src/kudu/util/web_callback_registry.h
M be/src/service/impala-http-handler.cc
M be/src/util/webserver.cc
M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
M tests/webserver/test_web_pages.py
6 files changed, 96 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/3
--
To view, visit http://gerrit.cloudera.org:8080/17726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
Gerrit-Change-Number: 17726
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10779: Print the username closing a session or cancelling a query from the WebUI

2021-07-24 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17726


Change subject: IMPALA-10779: Print the username closing a session or 
cancelling a query from the WebUI
..

IMPALA-10779: Print the username closing a session or cancelling a query from 
the WebUI

This patch appends the username of the client who made the request to
close a session or cancel a query from the coordinator's debug WebUI.

Tests:
- Run related tests manually for use authentication

Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
---
M be/src/kudu/util/web_callback_registry.h
M be/src/service/impala-http-handler.cc
M be/src/util/webserver.cc
M tests/webserver/test_web_pages.py
4 files changed, 14 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17726/2
--
To view, visit http://gerrit.cloudera.org:8080/17726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I02c92b5caee61d1f9f381cd2906a850e02c54d55
Gerrit-Change-Number: 17726
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support

2021-07-14 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17503 )

Change subject: IMPALA-10771: Add Tencent COS support
..


Patch Set 4:

The issue has been created for tracking, see: 
https://github.com/tencentyun/hadoop-cos/issues/35.


--
To view, visit http://gerrit.cloudera.org:8080/17503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Gerrit-Change-Number: 17503
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Jul 2021 14:19:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support

2021-06-27 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17503 )

Change subject: IMPALA-10771: Add Tencent COS support
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/impala_test_suite.py@1022
PS3, Line 1022:
> flake8: E501 line too long (96 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/skip.py
File tests/common/skip.py:

http://gerrit.cloudera.org:8080/#/c/17503/3/tests/common/skip.py@131
PS3, Line 131:
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17503/3/tests/metadata/test_stale_metadata.py
File tests/metadata/test_stale_metadata.py:

http://gerrit.cloudera.org:8080/#/c/17503/3/tests/metadata/test_stale_metadata.py@22
PS3, Line 22: from tests.common.skip import SkipIfS3, SkipIfGCS
> flake8: F401 'tests.common.skip.SkipIfCOS' imported but unused
Done



--
To view, visit http://gerrit.cloudera.org:8080/17503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Gerrit-Change-Number: 17503
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 27 Jun 2021 06:01:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support

2021-06-27 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17503 )

Change subject: IMPALA-10771: Add Tencent COS support
..

IMPALA-10771: Add Tencent COS support

This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.

New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.

Follow-up:
- Support for caching COS file handles will be addressed in
   IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   COS (IMPALA-10773).

Tests:
 - Upload hdfs test data to a COS bucket. Modify all locations in HMS
   DB to point to the COS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_metastore_service.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
62 files changed, 279 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17503/4
--
To view, visit http://gerrit.cloudera.org:8080/17503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Gerrit-Change-Number: 17503
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10771: Add Tencent COS support

2021-06-26 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17503


Change subject: IMPALA-10771: Add Tencent COS support
..

IMPALA-10771: Add Tencent COS support

This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.

New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.

Follow-up:
- Support for caching COS file handles will be addressed in
   IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   COS (IMPALA-10773).

Tests:
 - Upload hdfs test data to a COS bucket. Modify all locations in HMS
   DB to point to the COS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_metastore_service.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
63 files changed, 277 insertions(+), 57 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17503/3
--
To view, visit http://gerrit.cloudera.org:8080/17503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Gerrit-Change-Number: 17503
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10717: Import Tuple functionality from DataSketches

2021-06-05 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17515


Change subject: IMPALA-10717: Import Tuple functionality from DataSketches
..

IMPALA-10717: Import Tuple functionality from DataSketches

This patch imports the functionality needed for Tuple approximate
algorithm from Apache DataSketches. I decided to copy the necessary
files into be/src/thirdparty/datasketches.

Browse the source files here:
https://github.com/apache/datasketches-cpp/tree/3.0.0

Change-Id: If14fc224ee5e767054020c0efcc25e57289f8ac3
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/README.md
A be/src/thirdparty/datasketches/array_of_doubles_a_not_b.hpp
A be/src/thirdparty/datasketches/array_of_doubles_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/array_of_doubles_intersection.hpp
A be/src/thirdparty/datasketches/array_of_doubles_intersection_impl.hpp
A be/src/thirdparty/datasketches/array_of_doubles_sketch.hpp
A be/src/thirdparty/datasketches/array_of_doubles_sketch_impl.hpp
A be/src/thirdparty/datasketches/array_of_doubles_union.hpp
A be/src/thirdparty/datasketches/array_of_doubles_union_impl.hpp
A be/src/thirdparty/datasketches/tuple_a_not_b.hpp
A be/src/thirdparty/datasketches/tuple_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/tuple_intersection.hpp
A be/src/thirdparty/datasketches/tuple_intersection_impl.hpp
A be/src/thirdparty/datasketches/tuple_jaccard_similarity.hpp
A be/src/thirdparty/datasketches/tuple_sketch.hpp
A be/src/thirdparty/datasketches/tuple_sketch_impl.hpp
A be/src/thirdparty/datasketches/tuple_union.hpp
A be/src/thirdparty/datasketches/tuple_union_impl.hpp
20 files changed, 2,298 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17515/2
--
To view, visit http://gerrit.cloudera.org:8080/17515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If14fc224ee5e767054020c0efcc25e57289f8ac3
Gerrit-Change-Number: 17515
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10689: Implement ds cpc union f() function.

2021-06-04 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17440 )

Change subject: IMPALA-10689: Implement ds_cpc_union_f() function.
..


Patch Set 4:

Thanks for the reviews.
I re-run the pre-review-test job and all test cases have passed, see: 
https://jenkins.impala.io/job/pre-review-test/969/. Try to re-run the 
gerrit-verify-dryrun job.


--
To view, visit http://gerrit.cloudera.org:8080/17440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib5c616316bf2bf2ff437678e9a44a15339920150
Gerrit-Change-Number: 17440
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 04 Jun 2021 11:14:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10689: Implement ds cpc union f() function.

2021-05-27 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17440


Change subject: IMPALA-10689: Implement ds_cpc_union_f() function.
..

IMPALA-10689: Implement ds_cpc_union_f() function.

This function receives two strings that are serialized Apache
DataSketches CPC sketches. Union two sketches and returns the
resulting sketch of union.

Example:
select ds_cpc_estimate(ds_cpc_union_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_cpc_estimate(ds_cpc_union_f(sketch1, sketch2)) |
+---+
| 15|
+---+

Change-Id: Ib5c616316bf2bf2ff437678e9a44a15339920150
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
6 files changed, 140 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/17440/3
--
To view, visit http://gerrit.cloudera.org:8080/17440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib5c616316bf2bf2ff437678e9a44a15339920150
Gerrit-Change-Number: 17440
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function

2021-05-18 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17373 )

Change subject: IMPALA-10688: Implement ds_cpc_stringify() function
..


Patch Set 4:

All tests run with the pre-review-test job passed,  failed test cases are not 
reproduced. See:https://jenkins.impala.io/job/pre-review-test/948/. Can the 
gerrit-verify-dryrun job be re-run, thanks.


--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 00:57:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function

2021-05-18 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17373 )

Change subject: IMPALA-10688: Implement ds_cpc_stringify() function
..

IMPALA-10688: Implement ds_cpc_stringify() function

This function receives a string that is a serialized Apache
DataSketches CPC sketch and returns its stringified format.

A stringified format should look like and contains the following data:

select ds_cpc_stringify(ds_cpc_sketch(float_col)) from
functional_parquet.alltypestiny;
++
| ds_cpc_stringify(ds_cpc_sketch(float_col)) |
++
| ### CPC sketch summary:|
|lg_k   : 11 |
|seed hash  : 93cc   |
|C  : 2  |
|flavor : 1  |
|merged : true   |
|intresting col : 0  |
|table entries  : 2  |
|window : not allocated  |
| ### End sketch summary |
||
++

Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
4 files changed, 59 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/17373/4
--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify function

2021-05-13 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17373


Change subject: IMPALA-10688: Implement ds_cpc_stringify function
..

IMPALA-10688: Implement ds_cpc_stringify function

This function receives a string that is a serialized Apache
DataSketches CPC sketch and returns its stringified format.

A stringified format should look like and contains the following data:

select ds_cpc_stringify(ds_cpc_sketch(float_col)) from
functional_parquet.alltypestiny;
++
| ds_cpc_stringify(ds_cpc_sketch(float_col)) |
++
| ### CPC sketch summary:|
|lg_k   : 11 |
|seed hash  : 93cc   |
|C  : 2  |
|flavor : 1  |
|merged : true   |
|intresting col : 0  |
|table entries  : 2  |
|window : not allocated  |
| ### End sketch summary |
||
++

Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
4 files changed, 59 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/17373/2
--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function

2021-05-12 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17372 )

Change subject: IMPALA-10687: Implement ds_cpc_union() function
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17372/1/testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test:

http://gerrit.cloudera.org:8080/#/c/17372/1/testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test@192
PS1, Line 192: # result as if the whole data was sketched together into a 
single sketch.
> I checked the test above that are run on functional_parquet.alltypessmall,
Done



--
To view, visit http://gerrit.cloudera.org:8080/17372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6
Gerrit-Change-Number: 17372
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 12 May 2021 14:25:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function

2021-05-12 Thread Fucun Chu (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17372

to look at the new patch set (#4).

Change subject: IMPALA-10687: Implement ds_cpc_union() function
..

IMPALA-10687: Implement ds_cpc_union() function

This function receives a set of serialized Apache DataSketches CPC
sketches produced by ds_cpc_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_cpc_estimate(ds_cpc_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_cpc_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_cpc_union() on those sketches

Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/cpc_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M tests/query_test/test_datasketches.py
7 files changed, 177 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17372/4
--
To view, visit http://gerrit.cloudera.org:8080/17372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6
Gerrit-Change-Number: 17372
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function

2021-05-12 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17372 )

Change subject: IMPALA-10687: Implement ds_cpc_union() function
..

IMPALA-10687: Implement ds_cpc_union() function

This function receives a set of serialized Apache DataSketches CPC
sketches produced by ds_cpc_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_cpc_estimate(ds_cpc_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_cpc_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_cpc_union() on those sketches

Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/cpc_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M tests/query_test/test_datasketches.py
7 files changed, 169 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17372/3
--
To view, visit http://gerrit.cloudera.org:8080/17372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6
Gerrit-Change-Number: 17372
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10687: Implement ds cpc union() function

2021-05-05 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17372


Change subject: IMPALA-10687: Implement ds_cpc_union() function
..

IMPALA-10687: Implement ds_cpc_union() function

This function receives a set of serialized Apache DataSketches CPC
sketches produced by ds_cpc_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_cpc_estimate(ds_cpc_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_cpc_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_cpc_union() on those sketches

Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/cpc_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M tests/query_test/test_datasketches.py
7 files changed, 170 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17372/1
--
To view, visit http://gerrit.cloudera.org:8080/17372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib94b45ae79efcc11adc077dd9df9b9868ae82cb6
Gerrit-Change-Number: 17372
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-14 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 4:

All tests run with the pre-review-test job passed, see: 
https://jenkins.impala.io/job/pre-review-test/909/. The failed test in the 
gerrit-verify-dryrun job (query_test/test_fetch.py, from: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13658/) did not 
reappear.
How to deal with this situation, re-run the gerrit-verify-dryrun job?


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 15 Apr 2021 01:57:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2021-04-13 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..

IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions

These functions can be used to get cardinality estimates of data
using CPC algorithm from Apache DataSketches. ds_cpc_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized CPC sketch in string format. This can be written to a
table or be fed directly to ds_cpc_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the CPC sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' CPC see:
http://datasketches.apache.org/docs/CPC/CPC.html
Figures-of-Merit Comparison of the HLL and CPC Sketches see:
https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch_parquet.lineitem to compare perfomance
   with ndv(). Depending on data characteristics ndv() appears 2x-3x
   faster. CPC gives closer estimate than current ndv(). CPC is more
   accurate than HLL in some cases

Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/cpc_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M tests/query_test/test_datasketches.py
12 files changed, 398 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16656/8
--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-12 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..

IMPALA-10631: Upgrade DataSketches to version 3.0.0

Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version
3.0.0

tests:
 -Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
---
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp
M be/src/thirdparty/datasketches/AuxHashMap.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp
M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp
M be/src/thirdparty/datasketches/CouponHashSet.hpp
M be/src/thirdparty/datasketches/CouponList-internal.hpp
M be/src/thirdparty/datasketches/CouponList.hpp
M be/src/thirdparty/datasketches/CubicInterpolation.hpp
M be/src/thirdparty/datasketches/HarmonicNumbers.hpp
M be/src/thirdparty/datasketches/Hll4Array-internal.hpp
M be/src/thirdparty/datasketches/Hll4Array.hpp
M be/src/thirdparty/datasketches/Hll6Array-internal.hpp
M be/src/thirdparty/datasketches/Hll6Array.hpp
M be/src/thirdparty/datasketches/Hll8Array-internal.hpp
M be/src/thirdparty/datasketches/Hll8Array.hpp
M be/src/thirdparty/datasketches/HllArray-internal.hpp
M be/src/thirdparty/datasketches/HllArray.hpp
M be/src/thirdparty/datasketches/HllSketch-internal.hpp
M be/src/thirdparty/datasketches/HllSketchImpl.hpp
M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/HllUtil.hpp
M be/src/thirdparty/datasketches/MurmurHash3.h
M be/src/thirdparty/datasketches/README.md
M be/src/thirdparty/datasketches/RelativeErrorTables.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp
M be/src/thirdparty/datasketches/cpc_common.hpp
M be/src/thirdparty/datasketches/cpc_compressor.hpp
M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp
M be/src/thirdparty/datasketches/cpc_sketch.hpp
M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp
M be/src/thirdparty/datasketches/cpc_union.hpp
M be/src/thirdparty/datasketches/cpc_union_impl.hpp
M be/src/thirdparty/datasketches/cpc_util.hpp
M be/src/thirdparty/datasketches/hll.hpp
M be/src/thirdparty/datasketches/icon_estimator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
M be/src/thirdparty/datasketches/kll_sketch.hpp
M be/src/thirdparty/datasketches/kll_sketch_impl.hpp
M be/src/thirdparty/datasketches/memory_operations.hpp
M be/src/thirdparty/datasketches/theta_a_not_b.hpp
M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/theta_comparators.hpp
A be/src/thirdparty/datasketches/theta_constants.hpp
A be/src/thirdparty/datasketches/theta_helpers.hpp
M be/src/thirdparty/datasketches/theta_intersection.hpp
A be/src/thirdparty/datasketches/theta_intersection_base.hpp
A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp
M be/src/thirdparty/datasketches/theta_intersection_impl.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp
M be/src/thirdparty/datasketches/theta_sketch.hpp
M be/src/thirdparty/datasketches/theta_sketch_impl.hpp
M be/src/thirdparty/datasketches/theta_union.hpp
A be/src/thirdparty/datasketches/theta_union_base.hpp
A be/src/thirdparty/datasketches/theta_union_base_impl.hpp
M be/src/thirdparty/datasketches/theta_union_impl.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp
M be/src/thirdparty/datasketches/u32_table.hpp
M be/src/thirdparty/datasketches/u32_table_impl.hpp
66 files changed, 2,646 insertions(+), 1,873 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/17294/3
--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2021-04-08 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..

IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions

These functions can be used to get cardinality estimates of data
using CPC algorithm from Apache DataSketches. ds_cpc_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized CPC sketch in string format. This can be written to a
table or be fed directly to ds_cpc_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the CPC sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' CPC see:
http://datasketches.apache.org/docs/CPC/CPC.html
Figures-of-Merit Comparison of the HLL and CPC Sketches see:
https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch_parquet.lineitem to compare perfomance
   with ndv(). Depending on data characteristics ndv() appears 2x-3x
   faster. CPC gives closer estimate than current ndv(). CPC is more
   accurate than HLL in some cases

Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/cpc_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M tests/query_test/test_datasketches.py
12 files changed, 398 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16656/6
--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-08 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17294


Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..

IMPALA-10631: Upgrade DataSketches to version 3.0.0

Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version
3.0.0

tests:
 -Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
---
M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp
M be/src/thirdparty/datasketches/AuxHashMap.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp
M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp
M be/src/thirdparty/datasketches/CouponHashSet.hpp
M be/src/thirdparty/datasketches/CouponList-internal.hpp
M be/src/thirdparty/datasketches/CouponList.hpp
M be/src/thirdparty/datasketches/CubicInterpolation.hpp
M be/src/thirdparty/datasketches/HarmonicNumbers.hpp
M be/src/thirdparty/datasketches/Hll4Array-internal.hpp
M be/src/thirdparty/datasketches/Hll4Array.hpp
M be/src/thirdparty/datasketches/Hll6Array-internal.hpp
M be/src/thirdparty/datasketches/Hll6Array.hpp
M be/src/thirdparty/datasketches/Hll8Array-internal.hpp
M be/src/thirdparty/datasketches/Hll8Array.hpp
M be/src/thirdparty/datasketches/HllArray-internal.hpp
M be/src/thirdparty/datasketches/HllArray.hpp
M be/src/thirdparty/datasketches/HllSketch-internal.hpp
M be/src/thirdparty/datasketches/HllSketchImpl.hpp
M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/HllUtil.hpp
M be/src/thirdparty/datasketches/MurmurHash3.h
M be/src/thirdparty/datasketches/README.md
M be/src/thirdparty/datasketches/RelativeErrorTables.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp
M be/src/thirdparty/datasketches/cpc_common.hpp
M be/src/thirdparty/datasketches/cpc_compressor.hpp
M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp
M be/src/thirdparty/datasketches/cpc_sketch.hpp
M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp
M be/src/thirdparty/datasketches/cpc_union.hpp
M be/src/thirdparty/datasketches/cpc_union_impl.hpp
M be/src/thirdparty/datasketches/cpc_util.hpp
M be/src/thirdparty/datasketches/hll.hpp
M be/src/thirdparty/datasketches/icon_estimator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
M be/src/thirdparty/datasketches/kll_sketch.hpp
M be/src/thirdparty/datasketches/kll_sketch_impl.hpp
M be/src/thirdparty/datasketches/memory_operations.hpp
M be/src/thirdparty/datasketches/theta_a_not_b.hpp
M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/theta_comparators.hpp
A be/src/thirdparty/datasketches/theta_constants.hpp
A be/src/thirdparty/datasketches/theta_helpers.hpp
M be/src/thirdparty/datasketches/theta_intersection.hpp
A be/src/thirdparty/datasketches/theta_intersection_base.hpp
A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp
M be/src/thirdparty/datasketches/theta_intersection_impl.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp
M be/src/thirdparty/datasketches/theta_sketch.hpp
M be/src/thirdparty/datasketches/theta_sketch_impl.hpp
M be/src/thirdparty/datasketches/theta_union.hpp
A be/src/thirdparty/datasketches/theta_union_base.hpp
A be/src/thirdparty/datasketches/theta_union_base_impl.hpp
M be/src/thirdparty/datasketches/theta_union_impl.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp
M be/src/thirdparty/datasketches/u32_table.hpp
M be/src/thirdparty/datasketches/u32_table_impl.hpp
65 files changed, 2,640 insertions(+), 1,867 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/17294/2
--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2021-04-08 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..


Patch Set 5:

DataSketches 3.0 has fixed this problem, need to wait for IMPALA-10631 to 
complete. Will be updated soon.


--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 09 Apr 2021 02:39:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10632: Update the Theta sketch serialization interface

2021-04-01 Thread Fucun Chu (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17261

to look at the new patch set (#2).

Change subject: IMPALA-10632: Update the Theta sketch serialization interface
..

IMPALA-10632: Update the Theta sketch serialization interface

DataSketches 3.0.0 removes the serialization of Update Theta sketch,
and uses Compact Theta sketch to serialize for backward compatibility.

tests:
 -Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I80470863097a4836ee07fe44babaef0c852f3051
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
3 files changed, 48 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/17261/2
--
To view, visit http://gerrit.cloudera.org:8080/17261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I80470863097a4836ee07fe44babaef0c852f3051
Gerrit-Change-Number: 17261
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10632 Update the Theta sketch serialization interface

2021-04-01 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17261


Change subject: IMPALA-10632 Update the Theta sketch serialization interface
..

IMPALA-10632 Update the Theta sketch serialization interface

DataSketches 3.0.0 removes the serialization of Update Theta sketch,
and uses Compact Theta sketch to serialize for backward compatibility.

tests:
 -Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I80470863097a4836ee07fe44babaef0c852f3051
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
3 files changed, 48 insertions(+), 42 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/17261/1
--
To view, visit http://gerrit.cloudera.org:8080/17261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I80470863097a4836ee07fe44babaef0c852f3051
Gerrit-Change-Number: 17261
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function

2021-03-25 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17186 )

Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc@194
PS4, Line 194:   datasketches::compact_theta_sketch sketch = 
intersection_sketch.get_result();
> Please add more comment about the use cases when this could return false. a
Done


http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc@195
PS4, Line 195: riali
> typo: theta
Done


http://gerrit.cloudera.org:8080/#/c/17186/4/be/src/exprs/datasketches-functions-ir.cc@223
PS4, Line 223:   if (serialized_sketch.is_null || serialized_sketch.len == 0) 
return BigIntVal::null();
> This comment is not needed
Done


http://gerrit.cloudera.org:8080/#/c/17186/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17186/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@560
PS4, Line 560: 0
> I miss 2 tests here:
Done



--
To view, visit http://gerrit.cloudera.org:8080/17186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
Gerrit-Change-Number: 17186
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 25 Mar 2021 15:19:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function

2021-03-25 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17186 )

Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function
..

IMPALA-10581: Implement ds_theta_intersect_f() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the intersection of two sketches
of same or different column and returns the resulting sketch of
intersection.

Example:
select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
6 files changed, 157 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/17186/5
--
To view, visit http://gerrit.cloudera.org:8080/17186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
Gerrit-Change-Number: 17186
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function

2021-03-24 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17186 )

Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function
..

IMPALA-10581: Implement ds_theta_intersect_f() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the intersection of two sketches
of same or different column and returns the resulting sketch of
intersection.

Example:
select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 123 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/17186/4
--
To view, visit http://gerrit.cloudera.org:8080/17186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
Gerrit-Change-Number: 17186
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function

2021-03-24 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17179 )

Change subject: IMPALA-10580: Implement ds_theta_union_f() function
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17179/4/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17179/4/be/src/exprs/datasketches-functions-ir.cc@163
PS4, Line 163: update_sketch_to_theta_unio
> Sorry it was my bad that I had a typo in my suggestion, but this function n
Done



--
To view, visit http://gerrit.cloudera.org:8080/17179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
Gerrit-Change-Number: 17179
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 24 Mar 2021 09:48:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function

2021-03-24 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17179 )

Change subject: IMPALA-10580: Implement ds_theta_union_f() function
..

IMPALA-10580: Implement ds_theta_union_f() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Union two sketches and returns the
resulting sketch of union.

Example:
select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) |
+---+
| 15|
+---+

Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 114 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/6
--
To view, visit http://gerrit.cloudera.org:8080/17179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
Gerrit-Change-Number: 17179
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function

2021-03-23 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17179 )

Change subject: IMPALA-10580: Implement ds_theta_union_f() function
..


Patch Set 4:

(2 comments)

The ds_theta_union() function has been implemented in IMPALA-10467

http://gerrit.cloudera.org:8080/#/c/17179/3/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17179/3/be/src/exprs/datasketches-functions-ir.cc@167
PS3, Line 167:   if (!DeserializeDsSketch(serialized_sketch, _ptr)) {
 :   LogSketchDeserializationError(ctx);
 :   return false;
 : }
 : union_sketch.update(*sketch_ptr);
 :   }
 :   return true;
 : }
> This part seems pretty similar to L175-182. Have you considered introducing
Done


http://gerrit.cloudera.org:8080/#/c/17179/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17179/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@436
PS3, Line 436: # Checks that ds_theta_union_f() returns an empty sketch for 
NULL inputs.
> Shouldn't this return null for null inputs? Have you checked the behaviour
ref: 
https://github.com/apache/datasketches-hive/blob/1.1.X-incubating/src/test/java/org/apache/datasketches/hive/theta/UnionSketchUDFTest.java#L36



--
To view, visit http://gerrit.cloudera.org:8080/17179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
Gerrit-Change-Number: 17179
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 23 Mar 2021 14:00:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function

2021-03-23 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17179 )

Change subject: IMPALA-10580: Implement ds_theta_union_f() function
..

IMPALA-10580: Implement ds_theta_union_f() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Union two sketches and returns the
resulting sketch of union.

Example:
select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) |
+---+
| 15|
+---+

Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 114 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/4
--
To view, visit http://gerrit.cloudera.org:8080/17179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
Gerrit-Change-Number: 17179
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10581: Implement ds theta intersect f() function

2021-03-18 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17186


Change subject: IMPALA-10581: Implement ds_theta_intersect_f() function
..

IMPALA-10581: Implement ds_theta_intersect_f() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the intersection of two sketches
of same or different column and returns the resulting sketch of
intersection.

Example:
select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 119 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/17186/2
--
To view, visit http://gerrit.cloudera.org:8080/17186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
Gerrit-Change-Number: 17186
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function

2021-03-18 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17179 )

Change subject: IMPALA-10580: Implement ds_theta_union_f() function
..

IMPALA-10580: Implement ds_theta_union_f() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Union two sketches and returns the
resulting sketch of union.

Example:
select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) |
+---+
| 15|
+---+

Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 111 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/3
--
To view, visit http://gerrit.cloudera.org:8080/17179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
Gerrit-Change-Number: 17179
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-17 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..


Patch Set 5:

(4 comments)

Thanks for the review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc
File be/src/exprs/datasketches-common.cc:

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc@54
PS4, Line 54: bool DeserializeDsSketch(
> Could you please comment that this is a specialization of the template Dese
Done


http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc@148
PS4, Line 148:
> nit: I don't think this comment and the one below adds much. The comment ab
Done


http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h
File be/src/exprs/datasketches-functions.h:

http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h@74
PS3, Line 74: sketches. If they ar
> nit: "...sketches. If they are not..."
Done


http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@398
PS4, Line 398: ch.
 : create table ske
> Does this mean that with this test A and B has no common items so the resul
Done



--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 17 Mar 2021 14:20:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-17 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 169 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/5
--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-15 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 166 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/4
--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10580: Implement ds theta union f() function

2021-03-14 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17179


Change subject: IMPALA-10580: Implement ds_theta_union_f() function
..

IMPALA-10580: Implement ds_theta_union_f() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Union two sketches and returns the
resulting sketch of union.

Example:
select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) |
+---+
| 15|
+---+

Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 103 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17179/2
--
To view, visit http://gerrit.cloudera.org:8080/17179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8329979b81ceeaad739a43fab79768ca9c2916fa
Gerrit-Change-Number: 17179
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-14 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@128
PS2, Line 128:   datasketches::theta_a_not_b a_not_b;
> nit: this comment is not needed as doesn't give extra info
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@131
PS2, Line 131: if (!first_serialized_sketch.is_null && 
first_serialized_sketch.len > 0) {
 : if (!DeserializeDsSketch(first_serialized_sketch, 
_sketch_ptr)) {
 :   LogSketchDeserializationError(ctx);
 :   return StringVal::null();
 : }
 :   }
 :   datasketches::theta_sketch::unique_ptr second_sketch_ptr;
 :   if (!second_serialized_sketch.is_null && 
second_serialized_sketch.len > 0) {
 : if (!DeserializeDsSketch(second_serialized_sketch, 
_sketch_ptr)) {
 :
> This part seems pretty identical to the section L141-150. Can you move it t
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@155
PS2, Line 155: d::stringstream serialized_input
> I'm not sure I understand the condition in this format :) Could you please
function ref: 
https://en.cppreference.com/w/cpp/memory/unique_ptr/operator_bool, usage has 
been modified with reference to the example.


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h
File be/src/exprs/datasketches-functions.h:

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h@73
PS2, Line 73: 'first_serialized_s
> Could you mention both sketch params?
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@330
PS2, Line 330: When A is empty and B is
> When A is empty and B is null.
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@331
PS2, Line 331: select ds_theta_estimate(ds_theta_exclude(ds_theta_sketch(f2), 
null))
> Could you please add another test where A is null and B is empty? (the oppo
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@332
PS2, Line 332: from functional_parquet.emptytable;
> Another test would be where A and B are both empty.
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@379
PS2, Line 379: i.ti i_ti, i.i i_i, i.bi i_bi, i.f i_f, i.d i_d, i.s i_s, i.c 
i_c, i.v i_v,i.nc i_nc,
> I miss a test where the result of an a-not-b is a non-empty sketch (where t
Done



--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sun, 14 Mar 2021 14:30:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-14 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 167 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/3
--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-05 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17153


Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 125 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/2
--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 


[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function

2021-03-03 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17088 )

Change subject: IMPALA-10520: Implement ds_theta_intersect() function
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG@14
PS3, Line 14:
> nit: not needed
Done


http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@271
PS3, Line 271: stimation, which is consistent
 : # with direct estimation of these sketches.
> Could you add tests that cover the second part of this sentence so that we
Done



--
To view, visit http://gerrit.cloudera.org:8080/17088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
Gerrit-Change-Number: 17088
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Mar 2021 03:24:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function

2021-03-03 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17088 )

Change subject: IMPALA-10520: Implement ds_theta_intersect() function
..

IMPALA-10520: Implement ds_theta_intersect() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and intersects them into a
single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and intersect them to get
estimates based on the partitions the user is interested in related
sketches. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_intersect(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_intersect() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_intersect() on those sketches

Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 182 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/4
--
To view, visit http://gerrit.cloudera.org:8080/17088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
Gerrit-Change-Number: 17088
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function

2021-02-23 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17088 )

Change subject: IMPALA-10520: Implement ds_theta_intersect() function
..

IMPALA-10520: Implement ds_theta_intersect() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and intersects them into a
single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and intersect them to get an
estimates based on the partitions the user is interested in related
sketches. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_intersect(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_intersect() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_intersect() on those sketches

Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 163 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/3
--
To view, visit http://gerrit.cloudera.org:8080/17088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
Gerrit-Change-Number: 17088
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function

2021-02-23 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17088 )

Change subject: IMPALA-10520: Implement ds_theta_intersect() function
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@271
PS2, Line 271:   static void DsThetaIntersectUpdate(
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@273
PS2, Line 273:   static StringVal DsThetaIntersectSerialize(FunctionContext*, 
const StringVal& src);
> line too long (92 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/17088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
Gerrit-Change-Number: 17088
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 23 Feb 2021 11:28:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function

2021-02-22 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17088


Change subject: IMPALA-10520: Implement ds_theta_intersect() function
..

IMPALA-10520: Implement ds_theta_intersect() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and intersects them into a
single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and intersect them to get an
estimates based on the partitions the user is interested in related
sketches. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_intersect(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_intersect() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_intersect() on those sketches

Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 161 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/2
--
To view, visit http://gerrit.cloudera.org:8080/17088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
Gerrit-Change-Number: 17088
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-18 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..

IMPALA-10467: Implement ds_theta_union() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_union() on those sketches

Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
7 files changed, 152 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/2
--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions

2021-02-16 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17008 )

Change subject: IMPALA-10463: Implement ds_theta_sketch() and 
ds_theta_estimate() functions
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@28
PS2, Line 28:data, the difference is around 1%-10%. ds_hll_estimate() is 
faster
> Did you forgot to add this additional section to the commit msg?
Done


http://gerrit.cloudera.org:8080/#/c/17008/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17008/3/be/src/exprs/aggregate-functions-ir.cc@1905
PS3, Line 1905:   if (dst->len == sizeof(datasketches::theta_union)) {
> There is one more thing I don't understand here:
1. theta_union.get_result() returns a compact sketch (compact_theta_sketch), 
does not support updating, and is inconsistent with the initial underlying type 
of dst (update_theta_sketch). This is different from the HLL sketch.
2. Based on the previous question, use theta_union as the underlying type of 
dst.
Relevant comments have been added to the code


http://gerrit.cloudera.org:8080/#/c/17008/3/be/src/exprs/aggregate-functions-ir.cc@1908
PS3, Line 1908:   } else if (dst->len == 
sizeof(datasketches::update_theta_sketch)) {
> A DCHECK would be nice in the else branch to verify that dst->len is sizeof
Done



--
To view, visit http://gerrit.cloudera.org:8080/17008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
Gerrit-Change-Number: 17008
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 16 Feb 2021 08:07:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions

2021-02-16 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17008 )

Change subject: IMPALA-10463: Implement ds_theta_sketch() and 
ds_theta_estimate() functions
..

IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions

These functions can be used to get cardinality estimates of data
using Theta algorithm from Apache DataSketches. ds_theta_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized Theta sketch in string format. This can be written to a
table or be fed directly to ds_theta_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the Theta sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' Theta see:
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch25_parquet.lineitem to compare perfomance
   with ds_hll_*. ds_theta_* is faster than ds_hll_* on the original
   data, the difference is around 1%-10%. ds_hll_estimate() is faster
   than ds_theta_estimate() on existing sketch. HLL and Theta gives
   closer estimate except for string. see IMPALA-10464.

Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions-test.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
11 files changed, 447 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/4
--
To view, visit http://gerrit.cloudera.org:8080/17008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
Gerrit-Change-Number: 17008
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2021-02-10 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..


Patch Set 4:

Performance comparison between ds_hll_* and ds_cpc_* functions see: 
https://issues.apache.org/jira/browse/IMPALA-10500


--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 11 Feb 2021 06:58:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2021-02-10 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..


Patch Set 4:

The test is being processed, update the document after completion


--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 10 Feb 2021 14:07:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions

2021-02-10 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17008 )

Change subject: IMPALA-10463: Implement ds_theta_sketch() and 
ds_theta_estimate() functions
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@7
PS2, Line 7: ds_theta_estimate
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@13
PS2, Line 13: ds_theta_estimate
> nit: same typo
Done


http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@28
PS2, Line 28:see IMPALA-10464.
> I'd also include some highlights from that perf measurement doc into the co
Done


http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1646
PS2, Line 1646: SerializeDsThetaSketch(
> In contrast with HLL as I see Theta doesn't compact the sketch just seriali
Done


http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1899
PS2, Line 1899:   or dst->len == sizeof(datasketches::theta_union));
> I;m a bit lost here. Could you help me understand why is it needed to conve
Previously, it was processed along the idea that the size of dst is unchanged, 
and it is better to return union_sketch.


http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc@110
PS2, Line 110: return 0;
> HLL returns a null here. Have you checked the behaviour in Hive to be in sy
Comparing the test cases of HLL and Theta, the results are different.
Theta:
https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/theta/EstimateSketchUDFTest.java#L34
HLL:
https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/hll/SketchToEstimateUDFTest.java#L31


http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@138
PS2, Line 138: # Check that ds_theta_estimate returns error for strings that 
are not serialized sketches.
> Please add a test when ds_theta_estimate() is used on an HLL sketch. I gues
Done



--
To view, visit http://gerrit.cloudera.org:8080/17008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
Gerrit-Change-Number: 17008
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 10 Feb 2021 13:38:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theta estimate() functions

2021-02-10 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17008 )

Change subject: IMPALA-10463: Implement ds_theta_sketch() and 
ds_theta_estimate() functions
..

IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions

These functions can be used to get cardinality estimates of data
using Theta algorithm from Apache DataSketches. ds_theta_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized Theta sketch in string format. This can be written to a
table or be fed directly to ds_theta_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the Theta sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' Theta see:
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch25_parquet.lineitem to compare perfomance
   with ds_hll_*. HLL and Theta gives closer estimate except for string,
   see IMPALA-10464.

Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions-test.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
11 files changed, 445 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/3
--
To view, visit http://gerrit.cloudera.org:8080/17008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
Gerrit-Change-Number: 17008
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-09 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17048


Change subject: IMPALA-10467: Implement ds_theta_union() function
..

IMPALA-10467: Implement ds_theta_union() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_union() on those sketches

Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
7 files changed, 162 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/1
--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theat estimate() functions

2021-02-09 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17008 )

Change subject: IMPALA-10463: Implement ds_theta_sketch() and 
ds_theat_estimate() functions
..

IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions

These functions can be used to get cardinality estimates of data
using Theta algorithm from Apache DataSketches. ds_theta_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized Theta sketch in string format. This can be written to a
table or be fed directly to ds_theat_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the Theta sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' Theta see:
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch25_parquet.lineitem to compare perfomance
   with ds_hll_*. HLL and Theta gives closer estimate except for string,
   see IMPALA-10464.

Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions-test.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
11 files changed, 399 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/2
--
To view, visit http://gerrit.cloudera.org:8080/17008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
Gerrit-Change-Number: 17008
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theat estimate() functions

2021-01-29 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17008


Change subject: IMPALA-10463: Implement ds_theta_sketch() and 
ds_theat_estimate() functions
..

IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions

These functions can be used to get cardinality estimates of data
using Theta algorithm from Apache DataSketches. ds_theta_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized Theta sketch in string format. This can be written to a
table or be fed directly to ds_theat_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the Theta sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' Theta see:
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch25_parquet.lineitem to compare perfomance
   with ds_hll_*. HLL and Theta gives closer estimate except for string,
   see IMPALA-10464.

Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions-test.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
11 files changed, 401 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/1
--
To view, visit http://gerrit.cloudera.org:8080/17008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
Gerrit-Change-Number: 17008
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


  1   2   >