[Impala-ASF-CR] IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21444 ) Change subject: IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21444 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0 Gerrit-Change-Number: 21444 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 05:32:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12216: Print timestamp for impala-shell errors
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21426 ) Change subject: IMPALA-12216: Print timestamp for impala-shell errors .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/21426/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21426/1//COMMIT_MSG@9 PS1, Line 9: This change will print timestamp of exception occurring during executing a query nit: we prefer 72 characters as the max width in commit message http://gerrit.cloudera.org:8080/#/c/21426/1/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/21426/1/shell/impala_client.py@763 PS1, Line 763: print("Warning: close session RPC failed: {0}, {1}".format(str(e), type(e))) This also needs a timestamp. http://gerrit.cloudera.org:8080/#/c/21426/1/shell/impala_client.py@1154 PS1, Line 1154: print('Caught exception {0}, type={1} in {2}. {3}' Let's also add timestamps in this method. http://gerrit.cloudera.org:8080/#/c/21426/1/shell/impala_client.py@1606 PS1, Line 1606: print('Caught exception {0}, type={1}'.format(str(e), type(e)), file=sys.stderr) This also needs a timestamp. http://gerrit.cloudera.org:8080/#/c/21426/1/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/21426/1/shell/impala_shell.py@1109 PS1, Line 1109: print("%s: %s, %s" % : (self.ERROR_CONNECTING_MESSAGE, type(e).__name__, e), file=sys.stderr) Please also add timestamp for this. We've seen errors like Error connecting: DisconnectedException, Error communicating with impalad: TSocket read 0 bytes It'd be nice to see when the connection fails. http://gerrit.cloudera.org:8080/#/c/21426/1/shell/impala_shell.py@1460 PS1, Line 1460: nit: redudant space -- To view, visit http://gerrit.cloudera.org:8080/21426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4abbd02aa9f61210b0333495bf191e72c22a5944 Gerrit-Change-Number: 21426 Gerrit-PatchSet: 1 Gerrit-Owner: Saurabh Katiyal Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 22 May 2024 02:39:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16202/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 01:44:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13040: (addendum) Inject larger delay for sanitized build
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21439 ) Change subject: IMPALA-13040: (addendum) Inject larger delay for sanitized build .. IMPALA-13040: (addendum) Inject larger delay for sanitized build TestLateQueryStateInit has been flaky in sanitized build because the largest delay injection time is fixed at 3 seconds. This patch fixes the issue by setting largest delay injection time equal to RUNTIME_FILTER_WAIT_TIME_MS, which is 3 second for regular build and 10 seconds for sanitized build. Testing: - Loop and pass test_runtime_filter_aggregation.py 10 times in ASAN build and 50 times in UBSAN build. Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Reviewed-on: http://gerrit.cloudera.org:8080/21439 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/custom_cluster/test_runtime_filter_aggregation.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Gerrit-Change-Number: 21439 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-13040: (addendum) Inject larger delay for sanitized build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21439 ) Change subject: IMPALA-13040: (addendum) Inject larger delay for sanitized build .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Gerrit-Change-Number: 21439 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 01:40:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@127 PS5, Line 127: double sel = Math.max(0.0, (double) diff / table.getNumRows()); > I think test against unique_with_nulls returns wrong cardinality estimate. Changed to divide over table.getNumRows() instead. All checks up to L110 should be sufficient to determine that column is unique. -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 01:23:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Hello Aman Sinha, Kurt Deschler, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21377 to look at the new patch set (#6). Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column Impala frontend can not evaluate BETWEEN/NOT BETWEEN predicate directly. It needs to transform a BetweenPredicate into a CompoundPredicate consisting of upper bound and lower bound BinaryPredicate through BetweenToCompoundRule.java. The BinaryPredicate can then be pushed down or rewritten into other form by another expression rewrite rule. However, the selectivity of BetweenPredicate or its derivatives remains unassigned and often collapses with other unknown selectivity predicates to have collective selectivity equals Expr.DEFAULT_SELECTIVITY (0.1). This patch adds a narrow optimization of BetweenPredicate selectivity when the following criteria are met: 1. The BetweenPredicate is bound to a slot reference of a single column of a table. 2. The column type is discrete, such as INTEGER or DATE. 3. The column stats are available. 4. The column is sufficiently unique based on available stats. 5. The BETWEEN/NOT BETWEEN predicate is in good form (lower bound value <= upper bound value). 6. The final calculated selectivity is less than or equal to Expr.DEFAULT_SELECTIVITY. If these criteria are unmet, the Planner will revert to the old behavior, which is letting the selectivity unassigned. Since this patch only target BetweenPredicate over unique column, the following query will still have the default scan selectivity (0.1): select count(*) from tpch.customer c where c.c_custkey >= 1234 and c.c_custkey <= 2345; While this equivalent query written with BETWEEN predicate will have lower scan selectivity: select count(*) from tpch.customer c where c.c_custkey between 1234 and 2345; This patch calculates the BetweenPredicate selectivity during transformation at BetweenToCompoundRule.java. The selectivity is piggy-backed into the resulting CompoundPredicate and BinaryPredicate as betweenSelectivity_ field, separate from the selectivity_ field. Analyzer.getBoundPredicates() is modified to prioritize the derived BinaryPredicate over ordinary BinaryPredicate in its return value to prevent the derived BinaryPredicate from being eliminated by a matching ordinary BinaryPredicate. Testing: - Add table functional_parquet.unique_with_nulls. - Add FE tests in ExprCardinalityTest#testBetweenSelectivity, ExprCardinalityTest#testNotBetweenSelectivity, and PlannerTest#testScanCardinality. - Pass core tests. Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q40.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q77.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q92.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q95.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test 27 files changed, 3,908 insertions(+), 3,502 deletions(-) git pu
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16201/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 01:00:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 00:56:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS8, Line 2465: boolean isEpActiveOrDisabled = isEventProcessin > I'm still not sure about this. What if I have a cluster with only Impala as Thanks. I just realize isEventProcessingActive already check for that (metastoreEventProcessor_ instanceof MetastoreEventsProcessor) Done. http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469 PS8, Line 2469: metastoreEventProcessor_ instanceo > I think it is better to be verbose here by printing the actual EventProcess Done http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@508 PS9, Line 508: ((MetastoreEventsProcessor) metastoreEventProcessor_).getStatus() Just question because I'm a little bit confused. Can this ever return EventProcessorStatus.DISABLED? I think it is impossible, but please correct me if I'm wrong. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 00:56:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16200/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 00:50:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG@18 PS2, Line 18: This patch adds a narrow optimization of BetweenPredicate selectivity > If the user's query has a predicate unique_col >=5 AND unique_col <= 10 in Correct. This is mainly because the analysis happen at BetweenToCompoundRule.java. A more general approach will require analyzing all range BinaryPredicate and quickly becomes complicated when they are overlaps with each other. I leave TODO at PlanNode.java to think about a more general approach to this problem. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2626 PS2, Line 2626: > The 'prioritization' part was not clear.. why exactly is it needed ? .. cou Added comment. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1856 PS2, Line 1856:*/ > nit: in the comments above, could you add a note about the acceptDate param Done http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@759 PS2, Line 759:*lower selectivity if analyzed as a pair. > nit: this comment needs updating to reflect the between selectivity. Added as issue number 3. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@774 PS2, Line 774: > nit: have 'been' assigned .. Done http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@104 PS2, Line 104: hasNumDist > Looking at the hasStats() method: It is enough to check for hasNumDistinctValues() only. Updated this code accordingly. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@107 PS2, Line 107: numNotNulls > Is there any test that covers the case where the null count made a differen Added tests against functional_parquet.unique_with_nulls. http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@127 PS5, Line 127: double sel = Math.max(0.0, (double) diff / stats.getNumDistinctValues()); I think test against unique_with_nulls returns wrong cardinality estimate. Should be half of what it is now. I'll check this line again. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@544 PS2, Line 544:* something like 0.33^2. > nit: this last sentence should be updated for the new behavior. Done -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 00:44:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Hello Aman Sinha, Kurt Deschler, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21377 to look at the new patch set (#5). Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column Impala frontend can not evaluate BETWEEN/NOT BETWEEN predicate directly. It needs to transform a BetweenPredicate into a CompoundPredicate consisting of upper bound and lower bound BinaryPredicate through BetweenToCompoundRule.java. The BinaryPredicate can then be pushed down or rewritten into other form by another expression rewrite rule. However, the selectivity of BetweenPredicate or its derivatives remains unassigned and often collapses with other unknown selectivity predicates to have collective selectivity equals Expr.DEFAULT_SELECTIVITY (0.1). This patch adds a narrow optimization of BetweenPredicate selectivity when the following criteria are met: 1. The BetweenPredicate is bound to a slot reference of a single column of a table. 2. The column type is discrete, such as INTEGER or DATE. 3. The column stats are available. 4. The column is sufficiently unique based on available stats. 5. The BETWEEN/NOT BETWEEN predicate is in good form (lower bound value <= upper bound value). 6. The final calculated selectivity is less than or equal to Expr.DEFAULT_SELECTIVITY. If these criteria are unmet, the Planner will revert to the old behavior, which is letting the selectivity unassigned. Since this patch only target BetweenPredicate over unique column, the following query will still have the default scan selectivity (0.1): select count(*) from tpch.customer c where c.c_custkey >= 1234 and c.c_custkey <= 2345; While this equivalent query written with BETWEEN predicate will have lower scan selectivity: select count(*) from tpch.customer c where c.c_custkey between 1234 and 2345; This patch calculates the BetweenPredicate selectivity during transformation at BetweenToCompoundRule.java. The selectivity is piggy-backed into the resulting CompoundPredicate and BinaryPredicate as betweenSelectivity_ field, separate from the selectivity_ field. Analyzer.getBoundPredicates() is modified to prioritize the derived BinaryPredicate over ordinary BinaryPredicate in its return value to prevent the derived BinaryPredicate from being eliminated by a matching ordinary BinaryPredicate. Testing: - Add table functional_parquet.unique_with_nulls. - Add FE tests in ExprCardinalityTest#testBetweenSelectivity, ExprCardinalityTest#testNotBetweenSelectivity, and PlannerTest#testScanCardinality. - Pass core tests. Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q40.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q77.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q92.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q95.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test 27 files changed, 3,908 insertions(+), 3,502 deletions(-) git pu
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Hello Aman Sinha, Kurt Deschler, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21377 to look at the new patch set (#4). Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column Impala frontend can not evaluate BETWEEN/NOT BETWEEN predicate directly. It needs to transform a BetweenPredicate into a CompoundPredicate consisting of upper bound and lower bound BinaryPredicate through BetweenToCompoundRule.java. The BinaryPredicate can then be pushed down or rewritten into other form by another expression rewrite rule. However, the selectivity of BetweenPredicate or its derivatives remains unassigned and often collapses with other unknown selectivity predicates to have collective selectivity equals Expr.DEFAULT_SELECTIVITY (0.1). This patch adds a narrow optimization of BetweenPredicate selectivity when the following criteria are met: 1. The BetweenPredicate is bound to a slot reference of a single column of a table. 2. The column type is discrete, such as INTEGER or DATE. 3. The column stats are available. 4. The column is sufficiently unique based on available stats. 5. The BETWEEN/NOT BETWEEN predicate is in good form (lower bound value <= upper bound value). 6. The final calculated selectivity is less than or equal to Expr.DEFAULT_SELECTIVITY. If these criteria are unmet, the Planner will revert to the old behavior, which is letting the selectivity unassigned. Since this patch only target BetweenPredicate over unique column, the following query will still have the default scan selectivity (0.1): select count(*) from tpch.customer c where c.c_custkey >= 1234 and c.c_custkey <= 2345; While this equivalent query written with BETWEEN predicate will have lower scan selectivity: select count(*) from tpch.customer c where c.c_custkey between 1234 and 2345; This patch calculates the BetweenPredicate selectivity during transformation at BetweenToCompoundRule.java. The selectivity is piggy-backed into the resulting CompoundPredicate and BinaryPredicate as betweenSelectivity_ field, separate from the selectivity_ field. Analyzer.getBoundPredicates() is modified to prioritize the derived BinaryPredicate over ordinary BinaryPredicate in its return value to prevent the derived BinaryPredicate from being eliminated by a matching ordinary BinaryPredicate. Testing: - Add table functional_parquet.unique_with_nulls. - Add FE tests in ExprCardinalityTest#testBetweenSelectivity, ExprCardinalityTest#testNotBetweenSelectivity, and PlannerTest#testScanCardinality. - Pass core tests. Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q40.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q77.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q92.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q95.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test 27 files changed, 3,908 insertions(+), 3,502 deletions(-) git pu
[Impala-ASF-CR] IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21444 ) Change subject: IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10658/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21444 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0 Gerrit-Change-Number: 21444 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 00:21:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21444 ) Change subject: IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds .. Patch Set 2: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10656/ -- To view, visit http://gerrit.cloudera.org:8080/21444 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0 Gerrit-Change-Number: 21444 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 00:12:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16199/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 00:07:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#9). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 34 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/9 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-13040: (addendum) Inject larger delay for sanitized build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21439 ) Change subject: IMPALA-13040: (addendum) Inject larger delay for sanitized build .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10657/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Gerrit-Change-Number: 21439 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 May 2024 20:34:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 1: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10655/ -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 19:54:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13040: (addendum) Inject larger delay for sanitized build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21439 ) Change subject: IMPALA-13040: (addendum) Inject larger delay for sanitized build .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10654/ -- To view, visit http://gerrit.cloudera.org:8080/21439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Gerrit-Change-Number: 21439 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 May 2024 19:25:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21444 ) Change subject: IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10656/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21444 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0 Gerrit-Change-Number: 21444 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 May 2024 19:05:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471 PS5, Line 2471: f no validWriteIdList is > L2471 and L2483 are two if conditions and I don't see event processor switc Done http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS8, Line 2465: boolean isEpActive = isEventProcessingActive(); I'm still not sure about this. What if I have a cluster with only Impala as query engine (no Hive, no Spark), and I turn off Event Processor entirely because I don't need it (metastoreEventProcessor_ is a NoOpEventProcessor)? Will Catalogd forced to reload table all the time? I think it should be: boolean isEpActiveOrDisabled = ... http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469 PS8, Line 2469: isEpActive ? "ACTIVE" : "INACTIVE" I think it is better to be verbose here by printing the actual EventProcessorStatus enum or "NONE" if metastoreEventProcessor_ is not a MetastoreEventsProcessor. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 8 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 18:10:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16198/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 8 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:48:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16197/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:47:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16196/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:46:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#8). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 32 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/8 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 8 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1267 PS5, Line 1267: > Turn this into a parameter of function verify_partition below. Ack -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:24:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS5, Line 2465: boolean isEpActive = isEventProcessingActive(); : if (LOG.isTraceEnabled()) { > Add trace LOG about EP status here, whether tbl is loaded, and whether tbl Ack http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471 PS5, Line 2471: f no validWriteIdList is > What happen if isEventProcessingActive() changed between L2471 and L2483? L2471 and L2483 are two if conditions and I don't see event processor switching state between them. But from a code readability perspective, it makes sense to store it in a boolean variable as we need in log.trace() statement above. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:24:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#6). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 33 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/6 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#7). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 32 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/7 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py@1268 PS7, Line 1268: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py@1268 PS7, Line 1268: flake8: E251 unexpected spaces around keyword / parameter equals -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:25:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py@1269 PS6, Line 1269: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py@1269 PS6, Line 1269: flake8: E251 unexpected spaces around keyword / parameter equals -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:24:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11 PS1, Line 11: NullPointerException > Ack. The issue happens with transactional tables as well. Done http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS5, Line 2465: LOG.trace("table {} exits in cache, last synced id {}", tbl.getFullName(), : tbl.getLastSyncedEventId()); Add trace LOG about EP status here, whether tbl is loaded, and whether tbl is transactional or not. Also wrap it with if (LOG.isTraceEnabled()) { ... } http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471 PS5, Line 2471: isEventProcessingActive() What happen if isEventProcessingActive() changed between L2471 and L2483? Should it be called once and stored to boolean variable instead? http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py@1266 PS4, Line 1266: test_no_ep_metadata_reload_for_insert > Good catch!! The issue happens with transactional tables as well. Done http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1267 PS5, Line 1267: test_table Turn this into a parameter of function verify_partition below. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 16:07:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has removed a vote on this change. Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10655/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 14:53:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13040: (addendum) Inject larger delay for sanitized build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21439 ) Change subject: IMPALA-13040: (addendum) Inject larger delay for sanitized build .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Gerrit-Change-Number: 21439 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 May 2024 14:17:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13040: (addendum) Inject larger delay for sanitized build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21439 ) Change subject: IMPALA-13040: (addendum) Inject larger delay for sanitized build .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10654/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Gerrit-Change-Number: 21439 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 May 2024 14:17:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21429 ) Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10653/ -- To view, visit http://gerrit.cloudera.org:8080/21429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Gerrit-Change-Number: 21429 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 21 May 2024 14:16:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12935: First pass on Calcite planner functions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21357 ) Change subject: IMPALA-12935: First pass on Calcite planner functions .. Patch Set 11: (5 comments) http://gerrit.cloudera.org:8080/#/c/21357/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21357/11//COMMIT_MSG@10 PS11, Line 10: Only basic functions will work "basic" is a bit vague here - is my understanding correct that all Impala builtin scalar functions will work with this commit if the argument types match completely? As ImpalaOperatorTable will do a lookup in the builtin db and RexCallConverter will do the same through FunctionResolver, I don't see why a function would not work. http://gerrit.cloudera.org:8080/#/c/21357/11//COMMIT_MSG@18 PS11, Line 18: paresr typo http://gerrit.cloudera.org:8080/#/c/21357/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedNullLiteral.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedNullLiteral.java: http://gerrit.cloudera.org:8080/#/c/21357/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedNullLiteral.java@28 PS11, Line 28: L typo http://gerrit.cloudera.org:8080/#/c/21357/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaOperatorTable.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaOperatorTable.java: http://gerrit.cloudera.org:8080/#/c/21357/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaOperatorTable.java@72 PS11, Line 72: operatorList.size() == 1 What is the exceptions if this is false? To we expect operatorList to be empty, or there can be multiple elements? If operatorList is expected to contain 1 or 0 elements, then there could be a check for this. http://gerrit.cloudera.org:8080/#/c/21357/11/testdata/workloads/functional-query/queries/QueryTest/calcite.test File testdata/workloads/functional-query/queries/QueryTest/calcite.test: http://gerrit.cloudera.org:8080/#/c/21357/11/testdata/workloads/functional-query/queries/QueryTest/calcite.test@122 PS11, Line 122: no need to be extensive about testing in this file. It would be nice to add basic coverage for types, so for each type T a function that returns T and a function that gets T as argument. Is this feasible with the current functions set? -- To view, visit http://gerrit.cloudera.org:8080/21357 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Gerrit-Change-Number: 21357 Gerrit-PatchSet: 11 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Tue, 21 May 2024 12:56:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats in HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats in HMS .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16195/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 11:34:24 + Gerrit-HasComments: No
[Impala-ASF-CR](branch-3.4.2) IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files.
Quanlong Huang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21410 ) Change subject: IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files. .. IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files. Independent linux packaging related content to package/CMakeLists.txt to make it more clearly. This patch also add LICENSE and NOTICE file in the final package. Testing: - Manually deploy package on Ubuntu22.04 and verify it. Backport note for 3.4.x: - Resolved conflicts in CMakeLists.txt and modified package/CMakeLists.txt accordingly. Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b Reviewed-on: http://gerrit.cloudera.org:8080/20263 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/21410 Reviewed-by: Xiang Yang Reviewed-by: Zihao Ye Tested-by: Quanlong Huang --- M .gitignore M CMakeLists.txt M NOTICE.txt A package/CMakeLists.txt 4 files changed, 136 insertions(+), 101 deletions(-) Approvals: Xiang Yang: Looks good to me, but someone else must approve Zihao Ye: Looks good to me, approved Quanlong Huang: Verified -- To view, visit http://gerrit.cloudera.org:8080/21410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: branch-3.4.2 Gerrit-MessageType: merged Gerrit-Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b Gerrit-Change-Number: 21410 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Zihao Ye
[Impala-ASF-CR](branch-3.4.2) IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21410 ) Change subject: IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files. .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: branch-3.4.2 Gerrit-MessageType: comment Gerrit-Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b Gerrit-Change-Number: 21410 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 21 May 2024 11:11:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats in HMS
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21445 Change subject: IMPALA-13102: Normalize invalid column stats in HMS .. IMPALA-13102: Normalize invalid column stats in HMS Column stats like numDVs, numNulls in HMS could have arbitrary values. Impala expects them to be non-negative or -1 for unknown. So loading tables with invalid stats values (<-1) will fail. This patch adds logic to normalize the stats values. If the value < -1, use -1 for it and add corresponding warning logs. Also refactor some redundant codes in ColumnStats. Tests: - Add e2e test Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a --- M fe/src/main/java/org/apache/impala/analysis/AlterTableSetColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M tests/metadata/test_compute_stats.py 5 files changed, 147 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/21445/1 -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats in HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats in HMS .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@453 PS1, Line 453: a flake8: W504 line break after binary operator http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@454 PS1, Line 454: v flake8: E131 continuation line unaligned for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 11:11:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13034: Add logs and counters for HTTP profile requests blocking client fetches
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21412 ) Change subject: IMPALA-13034: Add logs and counters for HTTP profile requests blocking client fetches .. Patch Set 3: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/21412/3/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/21412/3/be/src/service/client-request-state.h@506 PS3, Line 506: void UpdateClientFetchLockWaitTime(int64_t lock_wait_time_ns) { another review uses "AddFetchLockWaitTime" for similar purpose. https://gerrit.cloudera.org/#/c/20850/16/be/src/service/client-request-state.h "add" seems clearer to me in this case than "update" http://gerrit.cloudera.org:8080/#/c/21412/3/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: http://gerrit.cloudera.org:8080/#/c/21412/3/be/src/service/impala-beeswax-server.cc@352 PS3, Line 352: VLOG(1) << "Error in get_log: " << status.GetDetail(); Is this supposed to be a long term log? It could contain more information, e.g. "Error in get_log, could not get query handle:" http://gerrit.cloudera.org:8080/#/c/21412/3/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/21412/3/be/src/service/impala-hs2-server.cc@1099 PS3, Line 1099: VLOG(1) << "Error in GetLog: " << status.GetDetail(); Same as in impala-beeswax-server.cc -- To view, visit http://gerrit.cloudera.org:8080/21412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I538ebe914f70f460bc8412770a8f7a1cc8b505dc Gerrit-Change-Number: 21412 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 21 May 2024 11:03:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13091: query test.test iceberg.TestIcebergV2Table.test metadata tables fails on an expected constant
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21440 ) Change subject: IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant .. IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant IMPALA-13079 added a test in iceberg-metadata-tables.test that included assertions about values that can change across builds, e.g. file sizes, which caused test failures. This commit fixes it by doing two things: 1. narrowing down the result set of the query to the column that the test is really about - this removes some of the problematic values 2. using regexes for the remaining problematic values. Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 Reviewed-on: http://gerrit.cloudera.org:8080/21440 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto --- M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 1 file changed, 3 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Verified Riza Suminto: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 Gerrit-Change-Number: 21440 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21429 ) Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10653/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Gerrit-Change-Number: 21429 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 21 May 2024 09:12:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/21429 ) Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Gerrit-Change-Number: 21429 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 21 May 2024 07:32:32 + Gerrit-HasComments: No