[Impala-ASF-CR] IMPALA-10483(part-1): Refactor table mask resolving
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17184 ) Change subject: IMPALA-10483(part-1): Refactor table mask resolving .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/17184/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17184/1//COMMIT_MSG@10 PS1, Line 10: based nit: 'base' instead of 'based' here and few other places. http://gerrit.cloudera.org:8080/#/c/17184/1//COMMIT_MSG@23 PS1, Line 23: This patch refactor the table mask resolving logic to not apply table One effect of not applying table mask by default is that now each of the clauses (SELECT, FROM, WHERE etc.) needs to explicitly call setDoTableMasking(). If there's a clause that is missing this implementation or if the programmer did not invoke this method on a clause for some reason, the flag will remain false and we could end up not applying the table mask. But as long as we have good test coverage on the queries, this may be ok. http://gerrit.cloudera.org:8080/#/c/17184/1/fe/src/main/java/org/apache/impala/analysis/WithClause.java File fe/src/main/java/org/apache/impala/analysis/WithClause.java: http://gerrit.cloudera.org:8080/#/c/17184/1/fe/src/main/java/org/apache/impala/analysis/WithClause.java@66 PS1, Line 66: public void setDoTableMasking(boolean doTableMasking) { I would think this would be an @Override of a corresponding method in base class but the base class does not have this method..is that intentional ? -- To view, visit http://gerrit.cloudera.org:8080/17184 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia191928fb179b0b0632235c1fff4c18647e5802f Gerrit-Change-Number: 17184 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 16 Mar 2021 06:21:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10553: External Frontend CTAS support .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 8 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 16 Mar 2021 03:34:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 21: (2 comments) http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG@12 PS21, Line 12: join builders and Parquet scanners Is it fair to say that the main value proposition of the column min/max statistics is in the join builder ? The Parquet scanners already have access to the row group's min/max stats per column, so it seems to me the stats coming from HMS will not add additional value there. But for the HJ builder, yes it helps by figuring out whether after the build phase let's say you have the range [10, 50] and the min/max stats fetched from HMS are [60, 100] then we can quickly say that the runtime min/max filter will exclude all row groups. But what happens if the stats are out-of-date ? Since these stats are getting uses not just for ACID tables but for external tables as well, the stats are not bound to the table's valid write id (applicable only for ACID tables). This can lead to incorrect overlap calculation. Any thoughts ? http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@221 PS21, Line 221: QUERY For the tests, it would be good to add a 3 table join case as well. My understanding is that if you have a fact table F and 2 dimension tables D1, D2 and suppose F is joined to D1, D2 as follows: F. a1 = D1.a1 AND F.a1 = D2.a1 then suppose the first join's range is [10, 50] and the second join's range is [20, 60] then the filter seen at scan of F will be [20, 50]. Are such scenario already tested elsewhere ? If so, feel free to point me to those and mark this resolved. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 16 Mar 2021 03:07:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17172 ) Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign .. IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign ImpalaHS2Client._open_session() has a 'retry_msg' variable which was not initialized in the code-path where retry was disabled. If an exception was hit with retry disabled, a compile time error was generated. The fix is to initialize 'retry_msg' in the non retry code-path. Testing: - Forced exception in ImpalaHS2Client._open_session() and verified that proper error message was generated. - Ran impala-shell e2e and custom cluster tests. Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 Reviewed-on: http://gerrit.cloudera.org:8080/17172 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M shell/impala_client.py 1 file changed, 2 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 Gerrit-Change-Number: 17172 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17172 ) Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 Gerrit-Change-Number: 17172 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 16 Mar 2021 01:14:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 ) Change subject: IMPALA-10577: Add retrying of AdmitQuery .. Patch Set 1: hadn't noticed Wenzhe posted comments earlier -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 16 Mar 2021 00:53:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery
Bikramjeet Vig has removed a vote on this change. Change subject: IMPALA-10577: Add retrying of AdmitQuery .. Removed Code-Review+2 by Bikramjeet Vig -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 ) Change subject: IMPALA-10577: Add retrying of AdmitQuery .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 16 Mar 2021 00:45:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 ) Change subject: IMPALA-10577: Add retrying of AdmitQuery .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc File be/src/scheduling/remote-admission-control-client.cc: http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc@40 PS1, Line 40: admission_max_retry_time_s I am wondering if it's good to set maximum number of times for retrying RPC call, or set maximum times for retrying RPC call? What's typical time for an admissiond ready to accept request after it's restarted? http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1385 PS1, Line 1385: assert result.data == ["730"] Could you add another test case for which sleep more than admission_max_retry_time_s before restart admissiond so that retry will fail? -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 16 Mar 2021 00:33:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 ) Change subject: IMPALA-10577: Add retrying of AdmitQuery .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8373/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 15 Mar 2021 22:14:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10553: External Frontend CTAS support .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8372/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 8 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 22:09:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 ) Change subject: IMPALA-10577: Add retrying of AdmitQuery .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1369 PS1, Line 1369: flake8: E203 whitespace before ':' -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 15 Mar 2021 21:55:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10553: External Frontend CTAS support .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6971/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 8 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 21:54:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17188 Change subject: IMPALA-10577: Add retrying of AdmitQuery .. IMPALA-10577: Add retrying of AdmitQuery This patch adds retries of the AdmitQuery rpc by coordinators. This helps to ensure that if an admissiond goes down and is restarted or is temporarily unreachable, queries won't fail. The retries are done with backoff and jitter to avoid overloading the admissiond in these scenarios. A new flag, --admission_max_retry_time_s, is added to control how long queries will continue retrying before giving up. The AdmitQuery rpc is made idempotent - if a query is submitted with the same query id as one the admissiond already knows about, AdmitQuery will return OK without submitting the query to be scheduled again. Testing: - Added a custom cluster test that checks that queries won't fail when the admissiond goes down. Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a --- M be/src/scheduling/admission-control-service.cc M be/src/scheduling/remote-admission-control-client.cc M be/src/scheduling/remote-admission-control-client.h M common/protobuf/admission_control_service.proto M tests/custom_cluster/test_admission_controller.py 5 files changed, 126 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17188/1 -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10553: External Frontend CTAS support .. Patch Set 8: Code-Review+2 Looks good -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 8 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 21:51:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10553: External Frontend CTAS support .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc@257 PS7, Line 257: // When an external FE has provided a staging directory we use that directly. : // We are trusting that the external frontend implementation has done appropriate : // authorization checks on the external output directory. > Nit: Update comment to call it an external output directory rather than a s Done http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@250 PS7, Line 250: if (writeId_ != -1) hdfsTableSink.setWrite_id(writeId_); : TTableSink tTableSink = new TTableSink(DescriptorTable.TABLE_SINK_ID, : > Nit: If the line hasn't changed, avoid style-only changes. Done -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 8 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 21:50:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17145 to look at the new patch set (#8). Change subject: IMPALA-10553: External Frontend CTAS support .. IMPALA-10553: External Frontend CTAS support - Adds the concept of an external staging dir to HdfsTableSink - This allows an external to specify the destination of the sink - When this is set, the external frontend is responsible for for moving and managing the results - Any DDL related operations are assumed to be handled by the external frontend - External frontends may optionally supply a partition depth which acts as a hint to skip a certain number of partitions while creating directories for partitions. This is for when the external frontend has pre-created a certain number of the directories in staging (usually the static portion of a partition specification)/ - Modifies delta/base naming to include 0 prefix padding to match Hive for dynamic partitioning detection - External frontends are responsible for doing authorization checks against the staging directory and it is assumed the external frontend service is not exposed directly to users. Co-authored-by: Kurt Deschler Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Reviewed-by: Kurt Deschler --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java 5 files changed, 120 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/8 -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 8 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 ) Change subject: IMPALA-10551: Add result sink support for external frontends .. Patch Set 7: Code-Review+2 > > (1 comment) > > > > Apart from the interaction with retry_failed_queries, this looks > > good to me. > > > > If you want to split off the retry_failed_queries piece into a > > separate change, I am ok with that. I think this change is ok if > > the external frontend guarantees not to pass in retry_failed_queries=true > > when using a result sink. > > I think it would be better to follow up. I've created > https://issues.apache.org/jira/browse/IMPALA-10586 to track that > work. Ok, then this looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 7 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 21:39:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 ) Change subject: IMPALA-10551: Add result sink support for external frontends .. Patch Set 7: > (1 comment) > > Apart from the interaction with retry_failed_queries, this looks > good to me. > > If you want to split off the retry_failed_queries piece into a > separate change, I am ok with that. I think this change is ok if > the external frontend guarantees not to pass in retry_failed_queries=true > when using a result sink. I think it would be better to follow up. I've created https://issues.apache.org/jira/browse/IMPALA-10586 to track that work. -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 7 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 21:37:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10518: Add ImpalaServer interface to retrieve executor membership.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17181 ) Change subject: IMPALA-10518: Add ImpalaServer interface to retrieve executor membership. .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.h@260 PS3, Line 260: class TUpdateExecutorMembershipRequest; Its standard in Impala to put all forward declarations at the top of the file - in this case, immediately after the "namespace impala {" line http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.cc@57 PS3, Line 57: impala:: You can just put this function in the "namespace impala{" block below and remove this http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/service/impala-hs2-server.cc@1230 PS3, Line 1230: PopulateExecutorMembershipRequest(membership_snapshot, : return_val.executor_membership); nit: I think this can fit on a single line More generally, you've got a few minor formatting errors, could you run this through clang-format-diff as described here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 http://gerrit.cloudera.org:8080/#/c/17181/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/17181/3/common/thrift/ImpalaService.thrift@870 PS3, Line 870: // Returns the executor membership information Note that this is only supported for the external fe service http://gerrit.cloudera.org:8080/#/c/17181/3/tests/hs2/test_hs2.py File tests/hs2/test_hs2.py: http://gerrit.cloudera.org:8080/#/c/17181/3/tests/hs2/test_hs2.py@750 PS3, Line 750: assert get_executor_membership_resp.executor_membership.num_executors == 3 I'm confused as to how we would get the correct value here if the response was an error, as checked in the previous line -- To view, visit http://gerrit.cloudera.org:8080/17181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie89b71f4555c368869ee7b9d6341756c60af12b5 Gerrit-Change-Number: 17181 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 21:07:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17061 ) Change subject: IMPALA-10503: testdata load hits hive memory limit errors during hive inserts .. IMPALA-10503: testdata load hits hive memory limit errors during hive inserts Changed the following hive settings to avoid hitting Hive container limit errors: hive.tez.container.size: 2048 hive.tez.java.opts: -Xmx1700m With these settings, testdata load completes without errors on a 32GB host. Reviewed-by: Fang-Yu Rao Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2 Reviewed-on: http://gerrit.cloudera.org:8080/17061 Tested-by: Impala Public Jenkins Reviewed-by: Aman Sinha --- M fe/src/test/resources/hive-site.xml.py M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 2 files changed, 138 insertions(+), 137 deletions(-) Approvals: Impala Public Jenkins: Verified Aman Sinha: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2 Gerrit-Change-Number: 17061 Gerrit-PatchSet: 13 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler
[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17061 ) Change subject: IMPALA-10503: testdata load hits hive memory limit errors during hive inserts .. Patch Set 12: Carry previous code review +2 -- To view, visit http://gerrit.cloudera.org:8080/17061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2 Gerrit-Change-Number: 17061 Gerrit-PatchSet: 12 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Mon, 15 Mar 2021 20:55:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17061 ) Change subject: IMPALA-10503: testdata load hits hive memory limit errors during hive inserts .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2 Gerrit-Change-Number: 17061 Gerrit-PatchSet: 12 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Mon, 15 Mar 2021 20:54:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10553: External Frontend CTAS support .. Patch Set 7: Code-Review+2 (2 comments) This looks good to me, only a couple style-only nits. Thanks for changing the naming for this. http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc@257 PS7, Line 257: // When an external FE has provided a staging directory we use that directly. : // We are trusting that the external frontend implementation has done appropriate : // authorization checks on the external staging directory. Nit: Update comment to call it an external output directory rather than a staging directory. http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@250 PS7, Line 250: if (writeId_ != -1) { : hdfsTableSink.setWrite_id(writeId_); : } Nit: If the line hasn't changed, avoid style-only changes. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 7 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 19:56:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17172 ) Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 Gerrit-Change-Number: 17172 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 15 Mar 2021 19:29:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17172 ) Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6970/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 Gerrit-Change-Number: 17172 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 15 Mar 2021 19:29:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 ) Change subject: IMPALA-10551: Add result sink support for external frontends .. Patch Set 7: (1 comment) Apart from the interaction with retry_failed_queries, this looks good to me. If you want to split off the retry_failed_queries piece into a separate change, I am ok with that. I think this change is ok if the external frontend guarantees not to pass in retry_failed_queries=true when using a result sink. http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802 PS6, Line 802: RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, FLAGS_hostname)); > I agree that retry_failed_queries might be problematic and we might want to The retrying happens at a level above the Coordinator object (at the QueryDriver level). Each new retry would have its own ClientRequestState object (which has its own Coordinator object). Here is a comment covering some of the concepts: https://github.com/apache/impala/blob/master/be/src/runtime/query-driver.h#L75 The pieces of code in Coordinator that would trigger a retry are the TryQueryRetry() calls in this file. I believe we did not intend for the retries to apply to DMLs, but I tried it out and we do actually retry DMLs right now. So, at the moment, there is nothing disabling retries for something with a result sink. I will file a JIRA for disabling retry_failed_queries for DMLs. I think that was unintentional. -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 7 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 15 Mar 2021 18:49:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 21: Thinking about min-max filters in the context of outer joins, I would expect they are _not_ pushed to the non-null producing side of an outer join (we have similar restriction for the runtime bloom filters). Is that handled somewhere (maybe in another patch) ? -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 15 Mar 2021 18:40:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 21: (5 comments) I haven't analyzed how exactly the overlap of min-max filters works..so will go through the design doc and maybe the original commit. In the meantime, sending some initial comments. http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/catalog-op-executor.cc File be/src/exec/catalog-op-executor.cc: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/catalog-op-executor.cc@316 PS19, Line 316: or non-string : // columns. Update this since min/max stats are not getting computed for Decimal types. Also, are we sure that allowing float/double min-max filters is ok ? These are not precise data types. http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc@474 PS19, Line 474: case PrimitiveType::TYPE_TIMESTAMP: The commit message says the min-max column stats is used for integer, float and double so I was not expecting to see timestamp and date types also considered here. Shouldn't they just to the default case and return false ? http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/incr-stats-util.cc File be/src/exec/incr-stats-util.cc: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/incr-stats-util.cc@260 PS19, Line 260: apache::hive::service::cli::thrift::TColumnValue It would be good to create a typedef for the hive version of TColumnValue. Also, the package name looks different from what I see in the source code here: https://github.com/apache/hive/blob/master/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TColumnValue.java#L7 http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc@957 PS19, Line 957: VLOG(3) << "The filter is set to always true."; nit: Would be useful to print that this was a min-max filter and probably the filter id. http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@10 PS19, Line 10: CREATE TABLE unique_database.lineitem_orderkey_only(l_orderkey bigint) Creating large tables during a test run adds unacceptable latency to the testing time. Can the db and table creation steps not be moved into testdata/datasets/tpch/ as a schema template for min-max filters ? -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 15 Mar 2021 18:14:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write id for ACID tables
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17081 ) Change subject: IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write id for ACID tables .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3145 PS3, Line 3145: addHmsPartitionsInTransaction Do we need similar changes in the alterTableRecoverPartitions path. What about dynamically created partitions? -- To view, visit http://gerrit.cloudera.org:8080/17081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad247008b7c206db00516326c1447bd00a9b34bd Gerrit-Change-Number: 17081 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 15 Mar 2021 17:45:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8371/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 15 Mar 2021 17:24:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write id for ACID tables
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17081 ) Change subject: IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write id for ACID tables .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/catalog/Catalog.java@78 PS3, Line 78: ACID_USER > I've added an abstract method getAcidUserId() to this class. Thanks. Looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/17081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad247008b7c206db00516326c1447bd00a9b34bd Gerrit-Change-Number: 17081 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 15 Mar 2021 17:20:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17171 ) Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/17171/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17171/4//COMMIT_MSG@15 PS4, Line 15: dead nit, I think using "infinite loop" is a more readable. http://gerrit.cloudera.org:8080/#/c/17171/4//COMMIT_MSG@18 PS4, Line 18: thrown nit, throw http://gerrit.cloudera.org:8080/#/c/17171/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/17171/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@847 PS4, Line 847: next = baseIterator_.next(); Does this mean that if the file which is present inside a directory and which is removed after the RemoteIterator was created, we will start seeing FileNotFoundException which can cause Table loading errors? -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 15 Mar 2021 17:17:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Qifan Chen has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. IMPALA-10494: Making use of the min/max column stats to improve min/max filters This patch adds the functionality to compute the minimal and the maximal value for a column of type integers, float or double for parquet tables, and to make use of the new stats to discard min/max filters, in both hash join builders and Parquet scanners, whose coverage are too close to the actual range defined by the column min and max. The computation and dislay of the new column min/max stats are done for Parquet tables only and can be controlled by two new Boolean query options (default to false): 1. compute_column_minmax_stats 2. show_column_minmax_stats Usage examples. set compute_column_minmax_stats=true; compute stats tpcds_parquet.store_sales; set show_column_minmax_stats=true; show column stats tpcds_parquet.store_sales; +---+--+-...---+-+-+ | Column| Type | #Falses | Min | Max | +---+--+-...---+-+-+ | ss_sold_time_sk | INT | -1 | 28800 | 75599 | | ss_item_sk| BIGINT | -1 | 1 | 18000 | | ss_customer_sk| INT | -1 | 1 | 10 | | ss_cdemo_sk | INT | -1 | 15 | 1920797 | | ss_hdemo_sk | INT | -1 | 1 | 7200| | ss_addr_sk| INT | -1 | 1 | 5 | | ss_store_sk | INT | -1 | 1 | 10 | | ss_promo_sk | INT | -1 | 1 | 300 | | ss_ticket_number | BIGINT | -1 | 1 | 24 | | ss_quantity | INT | -1 | 1 | 100 | | ss_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_discount_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_tax| DECIMAL(7,2) | -1 | -1 | -1 | | ss_coupon_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid_inc_tax | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_profit | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sold_date_sk | INT | -1 | 2450816 | 2452642 | +---+--+-...---+-+-+ Only the min/max values for non-partition columns are stored in HMS. The min/max values for partition columns are computed in coordinator. The min-max filters, in C++ class or protobuf form, are augmented to deal with the always true state better. Once always true is set, the actual min and max values in the filter are no longer populated. Testing: - Added new compute/show stats tests for integers, float and double column data types in compute-stats-column-minmax.test; - Added new tests in overlap_min_max_filters.test to demonstrate the usefulness of column stats to quickly disable useless filters in both hash join builder and Parquet scanner; - Added tests in min-max-filter-test.cc to demonstrate method Or(), ToProtobuf() and constructor can deal with always true flag well; - core tests. TODO: 1. Test compute stats for timestamp and date columns; 2. Enable the feature for Iceberg tables with Parquet data files. Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df --- M be/src/exec/catalog-op-executor.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-scanner.h M be/src/exec/incr-stats-util-test.cc M be/src/exec/incr-stats-util.cc M be/src/exec/incr-stats-util.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/service/hs2-util.cc M be/src/service/hs2-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/CatalogObjects.thrift M common/thrift/Frontend.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apac
[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17187 ) Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory config. .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8370/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e Gerrit-Change-Number: 17187 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 15 Mar 2021 15:55:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17187 Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory config. .. IMPALA-10583: Fix bug on unbounded result spooling memory config. Two bugs happening on result spooling when we set two of its query options to 0 (unbounded). The first bug happens if max_spilled_result_spooling_mem = 0 (unbounded). max_unpinned_bytes_ in SpillableRowBatchQueue will be set to 0, SpillableRowBatchQueue::IsFull() then will always return true, and the query hang. This patch fix it by setting max_unpinned_bytes_ to INT64_MAX if max_spilled_result_spooling_mem = 0. The second bug happens if we set max_result_spooling_mem = 0 (unbounded). PlanRootSink.java will peg maxMemReservationBytes to always equal to minMemReservationBytes. This patch fixes this by reverting to the default max_result_spooling_mem (100MB). Testing: - Add test_unbounded_result_spooling_mem. - Pass core tests. Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/query_test/test_result_spooling.py 4 files changed, 43 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17187/1 -- To view, visit http://gerrit.cloudera.org:8080/17187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e Gerrit-Change-Number: 17187 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10552: Support external frontends supplying timeline for profile
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17183 ) Change subject: IMPALA-10552: Support external frontends supplying timeline for profile .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG@59 PS1, Line 59: Add Kurt as co-author -- To view, visit http://gerrit.cloudera.org:8080/17183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec Gerrit-Change-Number: 17183 Gerrit-PatchSet: 1 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Comment-Date: Mon, 15 Mar 2021 15:20:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8369/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 15 Mar 2021 15:04:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Qifan Chen has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. IMPALA-10494: Making use of the min/max column stats to improve min/max filters This patch adds the functionality to compute the minimal and the maximal value for a column of type integers, float or double for parquet tables, and to make use of the new stats to discard min/max filters, in both hash join builders and Parquet scanners, whose coverage are too close to the actual range defined by the column min and max. The computation and dislay of the new column min/max stats are done for Parquet tables only and can be controlled by two new Boolean query options (default to false): 1. compute_column_minmax_stats 2. show_column_minmax_stats Usage examples. set compute_column_minmax_stats=true; compute stats tpcds_parquet.store_sales; set show_column_minmax_stats=true; show column stats tpcds_parquet.store_sales; +---+--+-...---+-+-+ | Column| Type | #Falses | Min | Max | +---+--+-...---+-+-+ | ss_sold_time_sk | INT | -1 | 28800 | 75599 | | ss_item_sk| BIGINT | -1 | 1 | 18000 | | ss_customer_sk| INT | -1 | 1 | 10 | | ss_cdemo_sk | INT | -1 | 15 | 1920797 | | ss_hdemo_sk | INT | -1 | 1 | 7200| | ss_addr_sk| INT | -1 | 1 | 5 | | ss_store_sk | INT | -1 | 1 | 10 | | ss_promo_sk | INT | -1 | 1 | 300 | | ss_ticket_number | BIGINT | -1 | 1 | 24 | | ss_quantity | INT | -1 | 1 | 100 | | ss_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_discount_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_tax| DECIMAL(7,2) | -1 | -1 | -1 | | ss_coupon_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid_inc_tax | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_profit | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sold_date_sk | INT | -1 | 2450816 | 2452642 | +---+--+-...---+-+-+ Only the min/max values for non-partition columns are stored in HMS. The min/max values for partition columns are computed in coordinator. The min-max filters, in C++ class or protobuf form, are augmented to deal with the always true state better. Once always true is set, the actual min and max values in the filter are no longer populated. Testing: - Added new compute/show stats tests for integers, float and double column data types in compute-stats-column-minmax.test; - Added new tests in overlap_min_max_filters.test to demonstrate the usefulness of column stats to quickly disable useless filters in both hash join builder and Parquet scanner; - Added tests in min-max-filter-test.cc to demonstrate method Or(), ToProtobuf() and constructor can deal with always true flag well; - core tests. TODO: 1. Test compute stats for timestamp and date columns; Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df --- M be/src/exec/catalog-op-executor.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-scanner.h M be/src/exec/incr-stats-util-test.cc M be/src/exec/incr-stats-util.cc M be/src/exec/incr-stats-util.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/service/hs2-util.cc M be/src/service/hs2-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/CatalogObjects.thrift M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/ja
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 ) Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8368/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 15 Mar 2021 08:04:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17153 ) Change subject: IMPALA-10558: Implement ds_theta_exclude() function .. IMPALA-10558: Implement ds_theta_exclude() function This function receives two strings that are serialized Apache DataSketches Theta sketches. Computes the a-not-b set operation given two sketches of same or different column. Example: select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) from sketch_tbl; +---+ | ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) | +---+ | 5 | +---+ Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 --- M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 5 files changed, 166 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/4 -- To view, visit http://gerrit.cloudera.org:8080/17153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3 Gerrit-Change-Number: 17153 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9234: Support Ranger row filtering policies
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16976 ) Change subject: IMPALA-9234: Support Ranger row filtering policies .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I580517be241225ca15e45686381b78890178d7cc Gerrit-Change-Number: 16976 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 15 Mar 2021 07:00:12 + Gerrit-HasComments: No