[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17344 ) Change subject: IMPALA-10677: Set selectivity of Not-equal .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8644/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 27 Apr 2021 04:25:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7103/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 27 Apr 2021 04:07:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17344 ) Change subject: IMPALA-10677: Set selectivity of Not-equal .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7102/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 27 Apr 2021 04:06:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17344 Change subject: IMPALA-10677: Set selectivity of Not-equal .. IMPALA-10677: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col = 5", but not "2 * col = 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 12 files changed, 60 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/1 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 23: Code-Review+2 > Patch Set 23: > > (7 comments) > > Thanks a lot Quanlong for the detailed analysis! > > I added more conversions, and now test_shell_interactive.py passes with the > non-accelerated protocol. > > I like the code less and less though and become unsure about the > no_utf8strings option. When reading thrift structures, it makes sense, as we > can avoid unnecessary decode + encode pairs if we expect the result in utf8. > But when writing, it would be better to convert every 'unicode' to utf8, it > too much hassle to do this in the caller. > > I think that ideally Thrift would always encode when writing but return > string during read based on some option from the protocol, and do this > consistently in both accelerated and normal protocol. Yeah, I think the hassle comes from http://gerrit.cloudera.org:8080/15524 (IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3). Start from that patch, we change our internal string type from 'str' to 'unicode' in python2: from __future__ import unicode_literals At that point we expect getting 'unicode' from thrift. Now we switch the thrift py module to be compiled with no_utf8strings, so we are getting 'str' from thrift. This breaks the codes expecting 'unicode' values and needs additional converting codes. To finish the python3 compatibility work in impala-shell, I think we still need to insist in importing unicode_literals. I have some thoughts on future items (need further discussion). * using thrift py module without no_utf8strings in Impyla, then Impyla may be able to remove the dependency on thriftpy2 in Python3. * Impyla can provide an option on whether returning 'str' or 'unicode' values in python2, and then do neccessary converting at the boundary. In our tests, we'd like Impyla returns 'str' values. * Finally we can get rid of the no_utf8strings option in impala-shell and don't need the converting codes added in this patch. The current patch set LGTM. Thanks for addressing the comments! -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 23 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 27 Apr 2021 02:14:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. IMPALA-10656: Fire insert events before commit Before this fix Impala committed an insert first, then reloaded the table from HMS, and generated the insert events based on the difference between the two snapshots. (e.g. which file was not present in the old snapshot but are there in the new one). Hive replication expects the insert events before the commit, so this may potentially lead to issues there. The solution is to collect the new files during the insert in the backend, and send the insert events based on this file set. This wasn't very hard to do as we were already collecting the files in some cases: - to move them from staging dir to their final location in case of non-partitioned tables - to write the file list to snapshot files in case of Iceberg tables This patch unifies the paths above and collects all information about the created files regardless of the table type. Testing: - no new tests, insert events were already covered in test_event_processing.py and MetastoreEventsProcessorTest.java - ran core tests Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Reviewed-on: http://gerrit.cloudera.org:8080/17313 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/output-partition.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M common/protobuf/control_service.proto M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 12 files changed, 247 insertions(+), 226 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 14: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 27 Apr 2021 00:41:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17282 ) Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated .. IMPALA-10644: RangerAuthorizationFactory cannot be instantiated Earlier when the GBN was bumped up to 11920537 in commit 1ab1143 some of the solr dependencies were excluded. This causes RangerAuthorizationFactory to initialization errors. This patch reverts the dependency exclusion to fix the problem. Testing: - Passes core job Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 Reviewed-on: http://gerrit.cloudera.org:8080/17282 Tested-by: Impala Public Jenkins Reviewed-by: Joe McDonnell --- M fe/pom.xml 1 file changed, 6 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Verified Joe McDonnell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 Gerrit-Change-Number: 17282 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17282 ) Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 Gerrit-Change-Number: 17282 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 22:01:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17282 ) Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 Gerrit-Change-Number: 17282 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 21:56:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8643/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 8 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 20:03:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. Patch Set 8: Thanks. The changes looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 8 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 20:02:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG@9 PS5, Line 9: For transactional tables, catalogd already guarantees consitent table : metadata reads > nit, Can you please reformat this commit msg to 72 line width as per the co Ack http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2966 PS7, Line 2966: n > I found that removing the table directly from catalog_ doesn't take the met Sure. http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py File tests/custom_cluster/test_metastore_service.py: http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@439 PS7, Line 439: invalid > nit, s/removed/invalidated Ack http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@442 PS7, Line 442: removed > nit, s/removed/invalidated For drop case, we remove (and not invalidate) from the cache. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 8 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 19:43:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17298 to look at the new patch set (#8). Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis For transactional tables, catalogd already guarantees consitent table metadata reads based on the writeIdList passed in the request. For non transactional tables, the reads are eventually consistent as in event processor thread in the background, processes HMS events for the table and updates its metadata. In this patch, to ensure strong consistency guarantees for external tables,we invalidate the table metadata from cache if HMS DDL apis like alter/drop table/partition are accessed from catalogd's metastore server. As a result of which, any subsequent get table request fetches the table from HMS and loads it in cache. This ensures that any get_table/get_partition requests after DDL operations on same table return updated table metadata. This behavior has a performance penalty since metadata loading in cache takes time specially for large tables. The change is behind catalogd server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The flag needs to be turned off in case of a performance bottleneck. Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/custom_cluster/test_metastore_service.py 6 files changed, 517 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/8 -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 8 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar
[native-toolchain-CR] IMPALA-10674: Update toolchain ORC libary for better Iceberg support
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17342 ) Change subject: IMPALA-10674: Update toolchain ORC libary for better Iceberg support .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17342 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72625f4bd6ff3e83ffaaa2c83d31b8ee29c0c35a Gerrit-Change-Number: 17342 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Mon, 26 Apr 2021 19:31:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10676: Improve start/stop scripts for Hiveserver and Metastore
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17340 ) Change subject: IMPALA-10676: Improve start/stop scripts for Hiveserver and Metastore .. Patch Set 1: Code-Review+2 (1 comment) Left a non-blocking comment below. http://gerrit.cloudera.org:8080/#/c/17340/1/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/17340/1/testdata/bin/run-hive-server.sh@145 PS1, Line 145: 30020 nit, It would be good to include this information in the commit message that we are now exposing 30020 debug port for HS2. -- To view, visit http://gerrit.cloudera.org:8080/17340 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9208efdf49f383c5cfb10cd9881272847405a05 Gerrit-Change-Number: 17340 Gerrit-PatchSet: 1 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 19:31:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 14: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7101/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 18:57:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 14: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 18:57:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8642/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 14 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 26 Apr 2021 18:26:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Once set to true, the insertion to such a filter completely skips the steps from the evaluation of the value from a row to the verification of the value in the min/max range. This optimization is LLVM-codeded. In addition, a new flag 'is_min_max_value_present' is added to TRuntimeFilterTargetDesc to indicate whether the min/max column stats is present in the query plan. The flag eliminates the need to check the presence of min/max stats for every row in runtime. The Insert() methods are optimized with branch predication compiler hints which yield 4% to 7% improvement for common SQL Integer types. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java 12 files changed, 370 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/14 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 14 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8641/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 26 Apr 2021 18:03:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 13: Code-Review+2 +1 and carrying forward Zoltan's +1 from earlier. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 17:49:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4876 PS12, Line 4876: These ACID_WRITE events are collected by HMS and become :* visible during commit > Removed this sentence. I definitely don't want to became a source of false Thanks. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 17:49:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/17262/7/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/17262/7/be/src/exec/parquet/hdfs-parquet-table-writer.cc@453 PS7, Line 453: parquet_bloom_filter_bytes_ = parent->parquet_bloom_filter_col_sizes_[column_name()]; line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 26 Apr 2021 17:45:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py File tests/query_test/test_parquet_bloom_filter.py: http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py@28 PS6, Line 28: p > flake8: E126 continuation line over-indented for hanging indent Done http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py@126 PS6, Line 126: s > flake8: E226 missing whitespace around arithmetic operator Done http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py@145 PS6, Line 145: w > flake8: F841 local variable 'bloom_filter_header' is assigned to but never Done -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 26 Apr 2021 17:44:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types This change adds support for writing Parquet Bloom filters for the types for which read support was added in IMPALA-10640. Writing of Parquet Bloom filters can be controlled by the 'parquet_bloom_filter_write' query option which has the following possible values: NEVER - never write Parquet Bloom filters IF_NO_DICT - write Parquet Bloom filters if specified in the table properties AND if the row group is not fully dictionary encoded ALWAYS - always write Parquet Bloom filters if specified in the table properties, even if the row group is fully dictionary encoded Introduced the 'parquet.bloom.filter.columns' table property. It is a comma separated pairs of 'col_name:bytes' pairs. The 'bytes' part means the size of the bitset of the Bloom filter, and is optional. If the size is not given, it will be the maximal Bloom filter size (ParquetBloomFilter::MAX_BYTES). Example: "col1:1024,col2,col4:100'. Testing: - Added a test in tests/query_test/test_parquet_bloom_filter.py that uses Impala to write the same table as in the test file 'testdata/data/parquet-bloom-filtering.parquet' and checks whether the Parquet Bloom filter header and bitset are identical. - TODO: Test falling back from dict encoding to plain and using Bloom filters. Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/hdfs-parquet-table-writer.h M be/src/exec/parquet/parquet-bloom-filter-util.cc M be/src/exec/parquet/parquet-bloom-filter-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/dict-encoding.h M be/src/util/parquet-bloom-filter-test.cc M be/src/util/parquet-bloom-filter.cc M be/src/util/parquet-bloom-filter.h M common/thrift/DataSinks.thrift M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M tests/query_test/test_parquet_bloom_filter.py 20 files changed, 584 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/7 -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 23: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8640/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 23 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 16:45:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8639/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 22 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 16:39:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 23: (7 comments) Thanks a lot Quanlong for the detailed analysis! I added more conversions, and now test_shell_interactive.py passes with the non-accelerated protocol. I like the code less and less though and become unsure about the no_utf8strings option. When reading thrift structures, it makes sense, as we can avoid unnecessary decode + encode pairs if we expect the result in utf8. But when writing, it would be better to convert every 'unicode' to utf8, it too much hassle to do this in the caller. I think that ideally Thrift would always encode when writing but return string during read based on some option from the protocol, and do this consistently in both accelerated and normal protocol. http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell File shell/impala-shell: http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell@29 PS21, Line 29: 0.1 > stale comment Done http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@85 PS21, Line 85: # Helper to decode utf8 encoded str to unicode type in Python 2. NOOP in Python 3. > While calling this on all string fields from thrift, I think we also need t Done http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@735 PS21, Line 735: > I think we need to encode this into 'str' when it's 'unicode' in python2. T Done http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@736 PS21, Line 736: ngImpalaHS2Service rpc is ide > This also contains unicodes, which could lead to an error in ImpalaHttpClie Done http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@1120 PS21, Line 1120: _service( > I think we need to encode this too, if it's unicode in python2. Done http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@85 PS22, Line 85: # Helper to decode utf8 encoded str to unicode type in Python 2. NOOP in Python 3. > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@91 PS22, Line 91: > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 23 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 16:34:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Hello Quanlong Huang, Tamas Mate, Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17170 to look at the new patch set (#23). Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 --- M CMakeLists.txt M be/src/benchmarks/network-perf-benchmark.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-service-client-wrapper.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalogd-main.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/hs2-http-test.cc M be/src/rpc/thrift-client.h M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-service-client-wrapper.h M be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/in-process-servers.h M be/src/transport/THttpServer.cpp M be/src/transport/THttpServer.h M be/src/transport/THttpTransport.cpp M be/src/transport/THttpTransport.h M be/src/transport/TSaslClientTransport.cpp M be/src/transport/TSaslClientTransport.h M be/src/transport/TSaslServerTransport.cpp M be/src/transport/TSaslServerTransport.h M be/src/transport/TSaslTransport.cpp M be/src/transport/TSaslTransport.h M be/src/util/parquet-reader.cc M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/impala-shell.sh M bin/set-pythonpath.sh M common/thrift/CMakeLists.txt M infra/python/deps/requirements.txt M java/pom.xml M shell/ext-py/thrift_sasl-0.4.2/setup.py M shell/impala-shell M shell/impala_client.py M shell/impala_shell.py M shell/make_shell_tarball.sh M shell/packaging/make_python_package.sh M shell/shell_output.py M tests/beeswax/impala_beeswax.py M tests/conftest.py M tests/query_test/test_observability.py M tests/shell/util.py 58 files changed, 258 insertions(+), 310 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/23 -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 23 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 22: (2 comments) http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@85 PS22, Line 85: def utf8_decode_if_needed(val): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@91 PS22, Line 91: def utf8_encode_if_needed(val): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 22 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 16:21:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Hello Quanlong Huang, Tamas Mate, Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17170 to look at the new patch set (#22). Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 --- M CMakeLists.txt M be/src/benchmarks/network-perf-benchmark.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-service-client-wrapper.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalogd-main.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/hs2-http-test.cc M be/src/rpc/thrift-client.h M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-service-client-wrapper.h M be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/in-process-servers.h M be/src/transport/THttpServer.cpp M be/src/transport/THttpServer.h M be/src/transport/THttpTransport.cpp M be/src/transport/THttpTransport.h M be/src/transport/TSaslClientTransport.cpp M be/src/transport/TSaslClientTransport.h M be/src/transport/TSaslServerTransport.cpp M be/src/transport/TSaslServerTransport.h M be/src/transport/TSaslTransport.cpp M be/src/transport/TSaslTransport.h M be/src/util/parquet-reader.cc M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/impala-shell.sh M bin/set-pythonpath.sh M common/thrift/CMakeLists.txt M infra/python/deps/requirements.txt M java/pom.xml M shell/ext-py/thrift_sasl-0.4.2/setup.py M shell/impala-shell M shell/impala_client.py M shell/impala_shell.py M shell/make_shell_tarball.sh M shell/packaging/make_python_package.sh M shell/shell_output.py M tests/beeswax/impala_beeswax.py M tests/conftest.py M tests/query_test/test_observability.py M tests/shell/util.py 58 files changed, 256 insertions(+), 310 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/22 -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 22 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17282 ) Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8638/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 Gerrit-Change-Number: 17282 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 16:08:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17282 ) Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7100/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 Gerrit-Change-Number: 17282 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Apr 2021 15:58:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
Joe McDonnell has uploaded a new patch set (#4) to the change originally created by Vihang Karajgaonkar. ( http://gerrit.cloudera.org:8080/17282 ) Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated .. IMPALA-10644: RangerAuthorizationFactory cannot be instantiated Earlier when the GBN was bumped up to 11920537 in commit 1ab1143 some of the solr dependencies were excluded. This causes RangerAuthorizationFactory to initialization errors. This patch reverts the dependency exclusion to fix the problem. Testing: - Passes core job Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 --- M fe/pom.xml 1 file changed, 6 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/17282/4 -- To view, visit http://gerrit.cloudera.org:8080/17282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41 Gerrit-Change-Number: 17282 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar
[native-toolchain-CR] IMPALA-10674: Update toolchain ORC libary for better Iceberg support
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17342 Change subject: IMPALA-10674: Update toolchain ORC libary for better Iceberg support .. IMPALA-10674: Update toolchain ORC libary for better Iceberg support We need the following fixes/features from the ORC library: * ORC-763: Fix timestamp inconsistencies with Java * ORC-784: Support setting timezone to timestamp column * ORC-666: Support timastamp with local timezone (this corresponds to the Iceberg TIMESTAMPTZ type) * ORC-781: Make type annotations available from C++ (this is needed for Iceberg column resolution via field ids) This commit adds the above via formatted patches. Testing: * executed the tests of the ORC library Change-Id: I72625f4bd6ff3e83ffaaa2c83d31b8ee29c0c35a --- M buildall.sh A source/orc/orc-1.6.2-patches/0008-ORC-763-C-Fix-ORC-timestamp-inconsistencies-with-Jav.patch A source/orc/orc-1.6.2-patches/0009-ORC-784-C-Support-setting-timezone-to-timestamp-colu.patch A source/orc/orc-1.6.2-patches/0010-ORC-666-C-Support-timestamp-with-local-timezone.patch A source/orc/orc-1.6.2-patches/0011-ORC-781-C-Make-type-annotations-available-from-C.patch 5 files changed, 1,516 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/42/17342/1 -- To view, visit http://gerrit.cloudera.org:8080/17342 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I72625f4bd6ff3e83ffaaa2c83d31b8ee29c0c35a Gerrit-Change-Number: 17342 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8637/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 14:19:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4876 PS12, Line 4876: :* 2. If the table is no > I see that you added this. But I am not sure if this is correct. HMS metada Removed this sentence. I definitely don't want to became a source of false information and I agree that this is implementation details. It would be the best if there was a public document somewhere that describes these concepts and we could link it in situations like this. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 14:02:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/17313/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17313/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4880 PS13, Line 4880:* https://github.com/apache/hive/blob/25892ea409/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3251 line too long (114 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 14:01:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17313 to look at the new patch set (#13). Change subject: IMPALA-10656: Fire insert events before commit .. IMPALA-10656: Fire insert events before commit Before this fix Impala committed an insert first, then reloaded the table from HMS, and generated the insert events based on the difference between the two snapshots. (e.g. which file was not present in the old snapshot but are there in the new one). Hive replication expects the insert events before the commit, so this may potentially lead to issues there. The solution is to collect the new files during the insert in the backend, and send the insert events based on this file set. This wasn't very hard to do as we were already collecting the files in some cases: - to move them from staging dir to their final location in case of non-partitioned tables - to write the file list to snapshot files in case of Iceberg tables This patch unifies the paths above and collects all information about the created files regardless of the table type. Testing: - no new tests, insert events were already covered in test_event_processing.py and MetastoreEventsProcessorTest.java - ran core tests Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 --- M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/output-partition.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M common/protobuf/control_service.proto M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 12 files changed, 247 insertions(+), 226 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/13 -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 26 Apr 2021 13:35:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. IMPALA-10631: Upgrade DataSketches to version 3.0.0 Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version 3.0.0 tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Reviewed-on: http://gerrit.cloudera.org:8080/17294 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/datasketches-test.cc M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp M be/src/thirdparty/datasketches/AuxHashMap.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp M be/src/thirdparty/datasketches/CouponHashSet.hpp M be/src/thirdparty/datasketches/CouponList-internal.hpp M be/src/thirdparty/datasketches/CouponList.hpp M be/src/thirdparty/datasketches/CubicInterpolation.hpp M be/src/thirdparty/datasketches/HarmonicNumbers.hpp M be/src/thirdparty/datasketches/Hll4Array-internal.hpp M be/src/thirdparty/datasketches/Hll4Array.hpp M be/src/thirdparty/datasketches/Hll6Array-internal.hpp M be/src/thirdparty/datasketches/Hll6Array.hpp M be/src/thirdparty/datasketches/Hll8Array-internal.hpp M be/src/thirdparty/datasketches/Hll8Array.hpp M be/src/thirdparty/datasketches/HllArray-internal.hpp M be/src/thirdparty/datasketches/HllArray.hpp M be/src/thirdparty/datasketches/HllSketch-internal.hpp M be/src/thirdparty/datasketches/HllSketchImpl.hpp M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp M be/src/thirdparty/datasketches/HllUnion-internal.hpp M be/src/thirdparty/datasketches/HllUtil.hpp M be/src/thirdparty/datasketches/MurmurHash3.h M be/src/thirdparty/datasketches/README.md M be/src/thirdparty/datasketches/RelativeErrorTables.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp M be/src/thirdparty/datasketches/cpc_common.hpp M be/src/thirdparty/datasketches/cpc_compressor.hpp M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp M be/src/thirdparty/datasketches/cpc_sketch.hpp M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp M be/src/thirdparty/datasketches/cpc_union.hpp M be/src/thirdparty/datasketches/cpc_union_impl.hpp M be/src/thirdparty/datasketches/cpc_util.hpp M be/src/thirdparty/datasketches/hll.hpp M be/src/thirdparty/datasketches/icon_estimator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp M be/src/thirdparty/datasketches/kll_sketch.hpp M be/src/thirdparty/datasketches/kll_sketch_impl.hpp M be/src/thirdparty/datasketches/memory_operations.hpp M be/src/thirdparty/datasketches/theta_a_not_b.hpp M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp A be/src/thirdparty/datasketches/theta_comparators.hpp A be/src/thirdparty/datasketches/theta_constants.hpp A be/src/thirdparty/datasketches/theta_helpers.hpp M be/src/thirdparty/datasketches/theta_intersection.hpp A be/src/thirdparty/datasketches/theta_intersection_base.hpp A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp M be/src/thirdparty/datasketches/theta_intersection_impl.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp M be/src/thirdparty/datasketches/theta_sketch.hpp M be/src/thirdparty/datasketches/theta_sketch_impl.hpp M be/src/thirdparty/datasketches/theta_union.hpp A be/src/thirdparty/datasketches/theta_union_base.hpp A be/src/thirdparty/datasketches/theta_union_base_impl.hpp M be/src/thirdparty/datasketches/theta_union_impl.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp M be/src/thirdparty/datasketches/u32_table.hpp M be/src/thirdparty/datasketches/u32_table_impl.hpp 66 files changed, 2,646 insertions(+), 1,873 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double conversion. .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8636/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 12:50:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double conversion. .. Patch Set 5: > (4 comments) > > The change looks great. A test with Parquet would be nice, other > than that I only found nitpicks. > > When you upload a new PS please reply to the comments. Most of the > time clicking on "Done" is enough. This way we'll know we won't > left anything open. I did reply to the comments but didn't know the reply gets saved as draft and needs explicit post later. Sorry about that. I figured that out now, so you might get some old replies too. -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 12:36:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double conversion. .. Patch Set 6: (20 comments) http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h File be/src/thirdparty/fast_double_parser/fast_double_parser.h: http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@13 PS6, Line 13: #if (defined(sun) || defined(__sun)) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@17 PS6, Line 17: #if defined(__CYGWIN__) || defined(__MINGW32__) || defined(__MINGW64__) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@22 PS6, Line 22: * Determining whether we should import xlocale.h or not is line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@25 PS6, Line 25: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || defined(FAST_DOUBLE_PARSER_CYGWIN) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@67 PS6, Line 67: #endif // defined(FAST_DOUBLE_PARSER_SOLARIS) || defined(FAST_DOUBLE_PARSER_CYGWIN) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@84 PS6, Line 84: * However, we have that line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@86 PS6, Line 86: * Thus it is possible for a number of the form w * 10^-342 where line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@94 PS6, Line 94: * Any number of form w * 10^309 where w>= 1 is going to be line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@153 PS6, Line 153: // credit: https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication line too long (110 > 90) http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@154 PS6, Line 154: really_inline uint64_t Emulate64x64to128(uint64_t& r_hi, const uint64_t x, const uint64_t y) { line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@159 PS6, Line 159: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@224 PS6, Line 224: */ line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@958 PS6, Line 958: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@960 PS6, Line 960: // The exponent is 1024 + 63 + power line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@976 PS6, Line 976: // The 65536 is (1<<16) and corresponds to line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@979 PS6, Line 979: // ((152170 * power ) >> 16) is equal to line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@980 PS6, Line 980: // floor(log(5**power)/log(2)) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@982 PS6, Line 982: // Note that this is not magic: 152170/(1<<16) is line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@984 PS6, Line 984: // The 1<<16 value is a power of two; we could use a line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@1097 PS6, Line 1097: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || defined(FAST_DOUBLE_PARSER_CYGWIN) line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Ge
[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.
Amogh Margoor has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double conversion. .. IMPALA-10654: Fix precision loss in DecimalValue to double conversion. Original approach to convert DecimalValue(internal representation of decimals) to double was not accurate. It was: static_cast(value_) / pow(10.0, scale). However only integers from −2^53 to 2^53 can be represented accurately by double precision without any loss. Hence, it would not work for numbers like -0.43149576573887316. For DecimalValue representing -0.43149576573887316, value_ would be -43149576573887316 and scale would be 17. As value_ < -2^53, result would not be accurate. In newer approach we are using third party library https://github.com/lemire/fast_double_parser, which handles above scenario in a performant manner. Testing: 1. Added End to End Tests covering following scenarios: a. Test to show precision limitation of 16 in the write path b. DecimalValue's value_ between -2^53 and 2^53. b. value_ outside above range but abs(value_) < UINT64_MAX c. abs(value_) > UINT64_MAX -covers DecimalValue<__int128_t> 2. Ran existing backend and end-to-end tests completely Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 --- M be/src/runtime/decimal-value.inline.h A be/src/thirdparty/fast_double_parser/LICENSE A be/src/thirdparty/fast_double_parser/LICENSE.BSL A be/src/thirdparty/fast_double_parser/README.md A be/src/thirdparty/fast_double_parser/fast_double_parser.h M bin/rat_exclude_files.txt M testdata/workloads/functional-query/queries/QueryTest/values.test M tests/query_test/test_insert_parquet.py 8 files changed, 1,579 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17303/6 -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 29: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8635/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 29 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Apr 2021 12:13:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 29: (182 comments) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h File be/src/thirdparty/xxhash/xxhash.h: http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@70 PS29, Line 70: https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735 line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@92 PS29, Line 92: * https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@113 PS29, Line 113: # elif defined (__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */) line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@243 PS29, Line 243: # define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_64bits_reset_withSecret) line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@253 PS29, Line 253: # define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSeed) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@254 PS29, Line 254: # define XXH3_128bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret) line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@270 PS29, Line 270: #define XXH_VERSION_NUMBER (XXH_VERSION_MAJOR *100*100 + XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE) line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@429 PS29, Line 429: * @param statePtr A pointer to an @ref XXH32_state_t allocated with @ref XXH32_createState(). line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@441 PS29, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, const XXH32_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@476 PS29, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@628 PS29, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, const XXH64_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@631 PS29, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@700 PS29, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@724 PS29, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (120 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@743 PS29, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, const XXH3_state_t* src_state); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@756 PS29, Line 756: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@766 PS29, Line 766: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t secretSize); line too long (121 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@768 PS29, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* statePtr, const void* input, size_t length); line too long (107 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@791 PS29, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@792 PS29, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (122 > 90)
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Daniel Becker has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. IMPALA-10640: Support reading Parquet Bloom filters - most common types This change adds read support for Parquet Bloom filters for types that can reasonably be supported in Impala. Other types, such as CHAR(N), would be very difficult to support because the length may be different in Parquet and in Impala which results in truncation or padding, and that changes the hash which makes using the Bloom filter impossible. Write support will be added in a later change. The supported Parquet type - Impala type pairs are the following: --- |Parquet type | Impala type| |---| |INT32| TINYINT, SMALLINT, INT | |INT64| BIGINT | |FLOAT| FLOAT | |DOUBLE | DOUBLE | |BYTE_ARRAY | STRING | --- The following types are not supported for the given reasons: |Impala type | Problem | || |VARCHAR(N) | truncation can change hash| |CHAR(N) | padding / truncation can change hash | |DECIMAL | multiple encodings supported | |TIMESTAMP | multiple encodings supported, timezone conversion | |DATE| not considered yet| Support may be added for these types later, see IMPALA-10641. If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests whether Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. The Parquet file used in the test was generated with an external tool. - Added unit tests for ParquetBloomFilter in file be/src/util/parquet-bloom-filter-test.cc - A minor, unrelated change was done in be/src/util/bloom-filter-test.cc: the MakeRandom() function had return type uint64_t, the documentation claimed it returned a 64 bit random number, but the actual number of random bits is 32, which is what is intended in the tests. The return type and documentation have been corrected to use 32 bits. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 --- M LICENSE.txt M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h A be/src/exec/parquet/parquet-bloom-filter-util.cc A be/src/exec/parquet/parquet-bloom-filter-util.h M be/src/exprs/expr-value.h M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/service/query-options.cc M be/src/service/query-options.h A be/src/thirdparty/xxhash/README.md A be/src/thirdparty/xxhash/xxhash.h M be/src/util/CMakeLists.txt M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/impala-bloom-filter-buffer-allocator.cc A be/src/util/impala-bloom-filter-buffer-allocator.h A be/src/util/parquet-bloom-filter-avx2.cc A be/src/util/parquet-bloom-filter-test.cc A be/src/util/parquet-bloom-filter.cc A be/src/util/parquet-bloom-filter.h M bin/jenkins/critique-gerrit-review.py M bin/rat_exclude_files.txt M bin/run_clang_tidy.sh M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M common/thrift/parquet.thrift M testdata/data/README A testdata/data/parquet-bloom-filtering.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter-disabled.test A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test A tests/query_test/test_parquet_bloom_filter.py 37 files changed, 7,410 insertions(+), 127 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/29 -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 29 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-N
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. Patch Set 4: Let's give it a try to re-run the job. We'll see. -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 26 Apr 2021 07:45:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7099/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 26 Apr 2021 07:45:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 26 Apr 2021 07:45:25 + Gerrit-HasComments: No