[Impala-ASF-CR] IMPALA-13468: Fix various aggregation issues in aggregation.test
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21961 ) Change subject: IMPALA-13468: Fix various aggregation issues in aggregation.test .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java: http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java@45 PS3, Line 45: return ImpalaTypeConverter.getRelDataType(Type.BOOLEAN); > https://github.com/apache/impala/blob/master/testdata/workloads/functional- Ok, for NULL constants, I wasn't aware that Impala's MIN/MAX returns boolean. Marking this resolved. (as an aside, I checked on Postgres .. it returns 'text' type for MIN(NULL).) -- To view, visit http://gerrit.cloudera.org:8080/21961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I038127d6a2f228ae8d263e983b1906e99ae05f77 Gerrit-Change-Number: 21961 Gerrit-PatchSet: 5 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anonymous Coward (816) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Thu, 24 Oct 2024 01:06:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13465: Trace TupleId further to reduce Agg cardinality
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21955 ) Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21955/3/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21955/3/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@322 PS3, Line 322: referrence nit: typo: "reference" -- To view, visit http://gerrit.cloudera.org:8080/21955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944 Gerrit-Change-Number: 21955 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 23 Oct 2024 23:06:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13468: Fix various aggregation issues in aggregation.test
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21961 ) Change subject: IMPALA-13468: Fix various aggregation issues in aggregation.test .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAggOperator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAggOperator.java: http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAggOperator.java@32 PS3, Line 32: ImpalaAggOpera > Bad cut and paste. No, ImpalaAggOperator needs to extend SqlAggFunction fo Done http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java: http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java@45 PS3, Line 45: return ImpalaTypeConverter.getRelDataType(Type.BOOLEAN); > Having NULL return boolean allows a specific test to pass. I still didn't get this and why this is needed for the MIN/MAX function to return a boolean. Can you point to the test that expects this ? http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java@58 PS3, Line 58: SqlCallBinding callBinding, > The inferReturnType for timestamp is able to return a timestamp. I'd imagin Done -- To view, visit http://gerrit.cloudera.org:8080/21961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I038127d6a2f228ae8d263e983b1906e99ae05f77 Gerrit-Change-Number: 21961 Gerrit-PatchSet: 5 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anonymous Coward (816) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Wed, 23 Oct 2024 17:36:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13468: Fix various aggregation issues in aggregation.test
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21961 ) Change subject: IMPALA-13468: Fix various aggregation issues in aggregation.test .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/21961/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21961/3//COMMIT_MSG@20 PS3, Line 20: min(NULL) and min(char types) nit: to be precise, this is both min and max http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAggOperator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAggOperator.java: http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAggOperator.java@32 PS3, Line 32: ImpalaOperator This says ImpalaOperator but the actual class is ImpalaAggOperator. Is ImpalaAggOperator supposed to be a derived class of ImpalaOperator ? http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAvgAggFunction.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAvgAggFunction.java: http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaAvgAggFunction.java@31 PS3, Line 31: TIMESTMP typo: TIMESTAMP http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaGroupingIdFunction.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaGroupingIdFunction.java: http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaGroupingIdFunction.java@21 PS3, Line 21: import com.google.common.base.Preconditions; Most of these imported classes are not used in this file. http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaGroupingIdFunction.java@67 PS3, Line 67: /** : */ nit: remove empty comments http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java: http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java@31 PS3, Line 31: SqlAvgAggFunction Confused by the mention of SqlAvgAgg since the class is doing MinMax, not Avg. http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java@31 PS3, Line 31: TIMESTMP Typo: TIMESTAMP http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java@45 PS3, Line 45: return ImpalaTypeConverter.getRelDataType(Type.BOOLEAN); Why would the Min/Max of a NULL type return a Boolean ? Is this what Impala expects ? http://gerrit.cloudera.org:8080/#/c/21961/3/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaMinMaxAggFunction.java@58 PS3, Line 58: return callBinding.getOperandType(0).getSqlTypeName().equals(SqlTypeName.TIMESTAMP) It wasn't clear why this is checking for TIMESTAMP even though the inferReturnType is mainly concerned with string types and null type. -- To view, visit http://gerrit.cloudera.org:8080/21961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I038127d6a2f228ae8d263e983b1906e99ae05f77 Gerrit-Change-Number: 21961 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Tue, 22 Oct 2024 20:52:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13465: Trace TupleId further to reduce Agg cardinality
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21955 ) Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality .. Patch Set 2: Code-Review+2 This LGTM. Carry over Michael's +1. -- To view, visit http://gerrit.cloudera.org:8080/21955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944 Gerrit-Change-Number: 21955 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 21 Oct 2024 05:48:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13405: Do tuple analysis to lower AggregationNode cardinality
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21860 ) Change subject: IMPALA-13405: Do tuple analysis to lower AggregationNode cardinality .. Patch Set 13: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21860/11/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21860/11/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@368 PS11, Line 368: // (in other words, the lowest PlanNode in query plan tree). > Thank you Aman for the counter example. I craft one planner test under agg- Yes, we can use a memoization approach in the future. The test with nested views looks good. Marking this resolved. -- To view, visit http://gerrit.cloudera.org:8080/21860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd589ab5f7ba9566a0d35784f61f5ffaef5696e7 Gerrit-Change-Number: 21860 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 18 Oct 2024 00:03:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13459: Handle duplicate table in same query
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21945 ) Change subject: IMPALA-13459: Handle duplicate table in same query .. Patch Set 3: > Patch Set 3: > > There is already a Jira that views in general don't work, but there is > nothing specific about duplicate views within a query. Do you think I should > file it now if there is a general "views don't work" jira? I suppose you could handle the view aliases referring to the same view later when views are supported, so I am good with this. -- To view, visit http://gerrit.cloudera.org:8080/21945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9997bc642c320c2e26294d7d02a05bccbba6a0d Gerrit-Change-Number: 21945 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anonymous Coward (816) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Thu, 17 Oct 2024 23:55:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13459: Handle duplicate table in same query
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21945 ) Change subject: IMPALA-13459: Handle duplicate table in same query .. Patch Set 3: Does this unique alias work for views as well ? If not, can we create a separate jira for that ? -- To view, visit http://gerrit.cloudera.org:8080/21945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9997bc642c320c2e26294d7d02a05bccbba6a0d Gerrit-Change-Number: 21945 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anonymous Coward (816) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Thu, 17 Oct 2024 21:09:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13405: Do tuple analysis to lower AggregationNode cardinality
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21860 ) Change subject: IMPALA-13405: Do tuple analysis to lower AggregationNode cardinality .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/21860/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21860/11//COMMIT_MSG@12 PS11, Line 12: AggragationNode nit: typo "AggragationNode" -> AggregationNode http://gerrit.cloudera.org:8080/#/c/21860/11/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21860/11/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@368 PS11, Line 368: List postOrderNodes = planNode.getNodesPostOrder(); Just thinking about a more complex plan where we have nested subqueries or views. In this, there will be repeated post order traversals of the sub trees for each level of the group-by. For example: create view v1 as select a, b, c, count(*) from t1 inner join t2 on t1.x = t2.y group by a, b, c; create view v2 as select a, b, count(*) from v1 group by a, b; select a, count(*) from v2 group by a; For doing the estimate of the outermost query, it would do a traversal of v2 and v1, although v1 would have already been traversed when v2's cardinality was estimated. For TPC-DS, this may not be a concern but just thinking about other scenarios. The tradeoff of extra compilation time may be acceptable but just want to bring up for discussion. -- To view, visit http://gerrit.cloudera.org:8080/21860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd589ab5f7ba9566a0d35784f61f5ffaef5696e7 Gerrit-Change-Number: 21860 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 17 Oct 2024 17:57:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13430: Too many RelNodes created for "IN" literals
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21911 ) Change subject: IMPALA-13430: Too many RelNodes created for "IN" literals .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21911/4/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java: http://gerrit.cloudera.org:8080/#/c/21911/4/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java@148 PS4, Line 148: private boolean isCoercedProjectForValues(ParentPlanRelContext context) { > I don't feel too comfortable keeping RelNode information in a global contex Marking this resolved. -- To view, visit http://gerrit.cloudera.org:8080/21911 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc3d84c70af9cd4db44359c4ab7f0c9eb70738f5 Gerrit-Change-Number: 21911 Gerrit-PatchSet: 4 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Mon, 14 Oct 2024 00:58:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13430: Too many RelNodes created for "IN" literals
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21911 ) Change subject: IMPALA-13430: Too many RelNodes created for "IN" literals .. Patch Set 4: Code-Review+1 (1 comment) I have a suggestion below about improving the code. Doing a +1 for now. Depending on your feedback on this, can bump it up. http://gerrit.cloudera.org:8080/#/c/21911/4/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java: http://gerrit.cloudera.org:8080/#/c/21911/4/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java@148 PS4, Line 148: private boolean isCoercedProjectForValues(ParentPlanRelContext context) { > Line 151 looks specifically for the Values node pattern and returns right a Ok, yeah that limits it to only the Project-Values plan; although the second part of my comment was actually more relevant. I understand this current implementation is a hack but it seems a simpler thing would be to just store the Project that was created in CoerceNode.processValues() in a List or a Set in a planner context class and just do a lookup here. The iterating through all the project exprs seems redundant if we have full control over what happens in CoerceNode. -- To view, visit http://gerrit.cloudera.org:8080/21911 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc3d84c70af9cd4db44359c4ab7f0c9eb70738f5 Gerrit-Change-Number: 21911 Gerrit-PatchSet: 4 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Sun, 13 Oct 2024 19:31:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13430: Too many RelNodes created for "IN" literals
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21911 ) Change subject: IMPALA-13430: Too many RelNodes created for "IN" literals .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21911/4/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java: http://gerrit.cloudera.org:8080/#/c/21911/4/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java@148 PS4, Line 148: private boolean isCoercedProjectForValues(ParentPlanRelContext context) { The logic in this method is looking for a specific pattern for project expressions but what's the guarantee that the Values rel is the only case where such a pattern occurs ? The number of project expressions could be quite large, so in the worst case, this could be iterating over all of them. Is there a better way to just annotate a Project (or a derived class of Project) sitting above a Values rel ? -- To view, visit http://gerrit.cloudera.org:8080/21911 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc3d84c70af9cd4db44359c4ab7f0c9eb70738f5 Gerrit-Change-Number: 21911 Gerrit-PatchSet: 4 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Sat, 12 Oct 2024 22:07:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13430: Too many RelNodes created for "IN" literals
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21911 ) Change subject: IMPALA-13430: Too many RelNodes created for "IN" literals .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/21911/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21911/3//COMMIT_MSG@13 PS3, Line 13: within one Values RelNode. For large IN-lists, Calcite supports the LogicalValues relnode (derived from Values) and converts it to a join [see comments in [1]). This would be the most efficient representation but we would need a physical representation of this as well, so something to consider for future work. [1] https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L6470 http://gerrit.cloudera.org:8080/#/c/21911/3//COMMIT_MSG@18 PS3, Line 18: limitation prevents the string literal to be of type "string" but : instead is of type "char(x)" For an IN list which has string literals of different lengths , 'ab', 'abcd' etc. would Calcite treat each one as different length char ? Curious why it would not just treat it as a VARCHAR instead. Can we look at the place it does this coercion to validate the behavior ? -- To view, visit http://gerrit.cloudera.org:8080/21911 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc3d84c70af9cd4db44359c4ab7f0c9eb70738f5 Gerrit-Change-Number: 21911 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Fri, 11 Oct 2024 19:29:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 6: Code-Review+2 (2 comments) I reviewed the TPC-DS plan changes and they are along expected lines due to the change in selectivity estimate, mostly for the between predicate on date_dim table. Bumping to +2. A note on the range predicate selectivity in general: this is normally done by histograms but in its absence in Impala, this patch is solving a narrower scope of this estimation. Future improvements in stats could subsume this patch. http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test: http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test@202 PS6, Line 202: | | tuple-ids=19,21 row-size=32B cardinality=9.65M cost=611185704 > Bulk of HashJoin memory is for the builder side. And in this case, I think Makes sense. http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test: http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test@123 PS6, Line 123: 40.26M > Yes, I think it is lower due to selective partition filter RF002 that comes Done -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 24 May 2024 00:07:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test: http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test@202 PS6, Line 202: | | tuple-ids=19,21 row-size=32B cardinality=9.65M cost=611185704 The estimate for the rows in the build side of the hashjoin dropped from 7.3K to 16, so the output cardinality reduction of the hash join makes sense. Also, the row size reduced by half. However, the overall memory estimate on line 134 didn't change. Any thoughts on that ? (Although on a different query , q16, it does affect the the fragment's per-instance memory estimate). http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test: http://gerrit.cloudera.org:8080/#/c/21377/6/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test@123 PS6, Line 123: 40.26M This scan's cardinality estimate dropped but there's no BETWEEN predicate on this table. Is the change coming from the RT filters ? -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 23 May 2024 23:17:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 6: Code-Review+1 (2 comments) Updated changes Lgtm. Doing a +1. I had done a cursory review of the tpc-ds plan changes. Once I review that a little more closely, will bump up the vote. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@104 PS2, Line 104: hasNumDist > It is enough to check for hasNumDistinctValues() only. Updated this code ac Done http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@107 PS2, Line 107: numNotNulls > Added tests against functional_parquet.unique_with_nulls. Thx for adding this new table. -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 22 May 2024 21:23:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 ) Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG@18 PS2, Line 18: This patch adds a narrow optimization of BetweenPredicate selectivity If the user's query has a predicate unique_col >=5 AND unique_col <= 10 instead of the BETWEEN, then this patch is not applying the selectivity calculation even though BETWEEN gets converted to the first representation. Is the reason mainly to keep this narrow in scope ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2626 PS2, Line 2626: // Prioritize members of 'betweenPredicates' ahead of 'result'. The 'prioritization' part was not clear.. why exactly is it needed ? .. could you elaborate a bit on this ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1856 PS2, Line 1856: public long evalToInteger(Analyzer analyzer, String name, boolean acceptDate) nit: in the comments above, could you add a note about the acceptDate parameter ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@759 PS2, Line 759:* The first issue is addressed by using a single default selectivity that is nit: this comment needs updating to reflect the between selectivity. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@774 PS2, Line 774: assigned nit: have 'been' assigned .. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@104 PS2, Line 104: hasStats() Looking at the hasStats() method: public boolean hasStats() { return numNulls_ != -1 || numDistinctValues_ != -1; } It can return true if either of the 2 stats are present. But the computation below depends on both being present. So it would be good to check for both explicitly. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@107 PS2, Line 107: numNotNulls Is there any test that covers the case where the null count made a difference to the Between selectivity ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@544 PS2, Line 544:* something like 0.33^2. nit: this last sentence should be updated for the new behavior. -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 20 May 2024 06:44:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12291 impala checks hdfs ranger policy
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20221 ) Change subject: IMPALA-12291 impala checks hdfs ranger policy .. Patch Set 9: > Patch Set 9: > > Hi all, based on our discussion so far, there are 2 approaches discussed. Any > any better alternative is also appreciated! > - Set the access level of a loaded table to both READ and WRITE as long as > Ranger is used as the authorization provider. > - Introduce a startup flag to allow the administrator to decide whether to > skip the file system permissions check during table loading. > > There are some implementation details to consider for each approach. > - How to determine whether Ranger is enabled (corresponding to the 1st > approach)? It seems checking the value of the key > 'dfs.namenode.inode.attributes.provider.class' in core-site.xml via the > instance of Configuration as done in the patch set 9 could not be verified > easily via a new test due to HDFS Ranger plug-in not being configured > correctly. To be more specific, if we try to add the following configuration > via > https://github.com/apache/impala/blob/master/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py, > the name node of HDFS could not be started. More plumbing has to be done to > set up the plug-in. On the other hand, using the instance of BackendConfig > allows us to add an end-to-end test to briefly verify Impala's behavior after > the patch more easily. > > > dfs.namenode.inode.attributes.provider.class > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer > > > - What should be the default behavior with respect to setting the access > level of a loaded table after this patch (corresponding to the 2nd approach)? > It looks like making Impala assume the READ and WRITE access by default may > be better especially in the legacy catalog mode. This relieves the Impala > administrator of having to manually tweak the HDFS access control entries of > the target HDFS paths that are not writable to the Impala service every time > when an end user encounters such a problem. > > I have also collected the related tests that need to be revised if we decide > to adopt the 2nd approach and make Impala assume the READ and WRITE access by > default. > - > https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_permissions.py#L56 > (TestHdfsPermissions.test_insert_into_read_only_table()). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L563 > (TestInsertBehaviour.test_multiple_group_acls). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L331 > (TestInsertBehaviour.test_readonly_table_dir). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L362 > (TestInsertBehaviour.test_insert_acl_permissions). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L439 > (TestInsertBehaviour.test_load_permissions). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L252 > (TestInsertBehaviour.test_mixed_partition_permissions). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L202 > (TestInsertBehaviour.test_insert_file_permissions). > - > https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4008 > (AnalyzeStmtsTest.TestLoadData). > - > https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4042-L4047 > (AnalyzeStmtsTest.TestInsert). > - > https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java#L110-L119 > (CatalogObjectToFromThriftTest.TestPartitionedTable). The first option sounds good to me: - Set the access level of a loaded table to both READ and WRITE as long as Ranger is used as the authorization provider. The second option of introducing a startup flag would create confusion because it would only apply to the legacy catalog mode, not the local catalog mode. Further, for the cloud object stores, we are in any case doing the first approach. The downside of the first approach is that it will defer the full access level checks to execution time and suppose some partitions allow the write operation while other partitions don't, we can end up with partial writes into the table. However, this issue already exists with the local catalog mode (which is the default mode). -- To view, visit http://gerrit.cloudera.org:8080/20221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9 Gerrit-Change-Number: 20221 Gerrit-PatchSet:
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. Patch Set 17: (14 comments) http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java@1 PS17, Line 1: /* In some files, the License header is enclosed in the multi statement syntax /*. .. */ whereas in others it is the single line comment. Pls make it consistent in all the files and use whatever is being used in existing Impala files. Impala uses the single line comment style. '// ' http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@30 PS17, Line 30: queryCtx; Pls add underscore http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@31 PS17, Line 31: stmtTableCache Add underscore http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@94 PS16, Line 94: : nit: add space or newline after : so that the query string is separated out. http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148 PS16, Line 148: if (e == null) { > I wanted the ability to print the stack trace hree. But I also throw an In So this should be e != null then. http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@166 PS16, Line 166: if (line.trim().startsWith("--") || line.trim().equals("")) { Comments can be multi-statement also e.g. /* ignore this */ SELECT a, b, c. There could be other patterns not handled here. Can we not use the parser methods directly ? If this is supposed to be a temporary function or if this is expected to go through revision later, pls add the relevant comments. http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@209 PS16, Line 209: LOG.info("Using Impala planner"); Can you make this consistent with the corresponding message for Calcite planner on line 94. Something like: LOG.info("Using Impala Planner for the following query: " + queryCtx.getStmt()); http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@245 PS16, Line 245: Frontend Timeline (Calcite Planner)") This QueryContext is used in both cases right ? i.e if Calcite planner fails, fall back to Impala planner. So this message should be modified if the fall back occurs. http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@116 PS17, Line 116: logical plan nit: to be precise, can we say 'initial logical plan' http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@121 PS17, Line 121: optimized plan nit: to be precise, can we say 'optimized logical plan' . http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@234 PS17, Line 234: // hack to match the FrontEnd code Can you add some more context to this .. which part of the FE code is this referring to ? http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. Patch Set 16: (7 comments) A few more comments based on a second pass. http://gerrit.cloudera.org:8080/#/c/21109/16/bin/set-classpath.sh File bin/set-classpath.sh: http://gerrit.cloudera.org:8080/#/c/21109/16/bin/set-classpath.sh@62 PS16, Line 62: FE nit: use expanded form, not acronym http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py@182 PS16, Line 182: U nit: lowercase 'u'. http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py@183 PS16, Line 183: "instead of JniFrontend.") JniFrontend is an internal class, shouldn't be mentioned in a user message. How about 'If true, use the Calcite planner for query optimization instead of Impala planner' http://gerrit.cloudera.org:8080/#/c/21109/16/fe/src/main/java/org/apache/impala/planner/PlannerContext.java File fe/src/main/java/org/apache/impala/planner/PlannerContext.java: http://gerrit.cloudera.org:8080/#/c/21109/16/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@97 PS16, Line 97: // Constructor useful for an external planner module nit: this comment can be removed since the purpose of this patch is to make Calcite planner an internal module. http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java: http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java@95 PS16, Line 95: List scanOutputExprs = new ArrayList<>(Collections.nCopies(totalCols, null)); For wide tables where we are only needing a few columns projected, we will end up with a long list with mostly Nulls. A LinkedHashMap (preserves Insertion order) where the key is position and value is the SlotRef would be better suited despite the cpu cost of hashing. In general, in a query planner, memory is the most precious commodity since the plan search space can be large, so anything we can do to reduce memory footprint would be preferred. That said, I would be ok if this is done in a subsequent patch, just keep track through a Jira. http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@52 PS15, Line 52: an experimental nit: remove experimental http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148 PS16, Line 148: if (e == null) { Why is this checking for null here ? e is already being referenced above. -- To view, visit http://gerrit.cloudera.org:8080/21109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Gerrit-Change-Number: 21109 Gerrit-PatchSet: 16 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Sat, 30 Mar 2024 01:56:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12872: Use Calcite for ...
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for ... .. Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/21109/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21109/5//COMMIT_MSG@7 PS5, Line 7: IMPALA-12872: Use Calcite for ... nit: combine the two lines. it seems short enough. How about 'Use Calcite for optimization - part 1: simple queries' http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/pom.xml File java/experimental-planner/pom.xml: http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/pom.xml@29 PS5, Line 29: experimental-planner you could call it calcite-planner since there's a clear association. I would think that if there was a configuration option introduced (at a later time) to enable this planner, it would likely be something like 'use_new_planner' or 'use_calcite_planner' rather than use_experimental_planner. http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java File java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java: http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java@33 PS5, Line 33: analyzer nit:use underscore to be consistent http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java@36 PS5, Line 36: ctx nit: use underscore to be consistent with other variable names. http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java@39 PS5, Line 39: public final Class parentClass_; We should use the actual base class name (RelNode/ImpalaPlanRel) instead of the generic Class. http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java File java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java: http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java@31 PS5, Line 31: tableMap nit: use underscore http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java File java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java: http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java@78 PS5, Line 78: private List foreignKeys; This field is not used in this implementation. Is it for future use ? http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java File java/experimental-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java: http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@87 PS5, Line 87: LOG.debug(getDebugString(resultObject)); nit: should this logging be in an else block ? http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java File java/experimental-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java: http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java@33 PS5, Line 33: nit: missing 'are' . Here and subsequent messages. -- To view, visit http://gerrit.cloudera.org:8080/21109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Gerrit-Change-Number: 21109 Gerrit-PatchSet: 5 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Mon, 18 Mar 2024 22:58:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12847: Expose computeScanRangeLocations and computeStats
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21077 ) Change subject: IMPALA-12847: Expose computeScanRangeLocations and computeStats .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21077 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia621309c67455bb599f71bec9efc1f67fc085022 Gerrit-Change-Number: 21077 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 28 Feb 2024 07:06:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12847: Expose computeScanRangeLocations and computeStats
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21077 ) Change subject: IMPALA-12847: Expose computeScanRangeLocations and computeStats .. Patch Set 2: Code-Review+1 (4 comments) Thanks for the patch. A few small comments but changes overall lgtm. http://gerrit.cloudera.org:8080/#/c/21077/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21077/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@218 PS2, Line 218: is nit: 'if' instead of 'is' http://gerrit.cloudera.org:8080/#/c/21077/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@221 PS2, Line 221: private Can we make this protected to be consistent with partitions_ member veriable. http://gerrit.cloudera.org:8080/#/c/21077/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@431 PS2, Line 431: getSampledPartitions It would be preferable to name this method such that it reflects it can return either sampled or raw partitions. The current naming gives a first impression of returning only sampled partitions. How about getSampledOrRawPartitions(). http://gerrit.cloudera.org:8080/#/c/21077/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1161 PS2, Line 1161: If 'sampleParams_' is not null Since this method is no longer checking sampledParams_ itself but relying on other method, you could modify the comment accordingly. -- To view, visit http://gerrit.cloudera.org:8080/21077 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia621309c67455bb599f71bec9efc1f67fc085022 Gerrit-Change-Number: 21077 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 28 Feb 2024 01:59:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12462: Update only changed partitions after COMPUTE STATS
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20505 ) Change subject: IMPALA-12462: Update only changed partitions after COMPUTE STATS .. Patch Set 5: Code-Review+1 (1 comment) > Patch Set 4: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/20505/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20505/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1891 PS4, Line 1891: !existingPartStats.equals(partitionStats) > Added a test for incremental stats. Sounds good. Thx for adding the test. A related point is that we advise not running a mix of compute stats and compute incremental stats on the same table .. so it is one or the other. -- To view, visit http://gerrit.cloudera.org:8080/20505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2703e0790d5c25db98ed26f26f6d96281c366a3 Gerrit-Change-Number: 20505 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 25 Sep 2023 14:21:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12462: Update only changed partitiones after COMPUTE STATS
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20505 ) Change subject: IMPALA-12462: Update only changed partitiones after COMPUTE STATS .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/20505/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20505/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1891 PS4, Line 1891: !existingPartStats.equals(partitionStats) There's some hidden cost here for the equals method as it may be doing an expensive comparison if incremental stats are present and this would add up for thousands of partitions. Any thoughts on that ?That said, the trade off could be acceptable considering the advantage of not performing the partition stats update. -- To view, visit http://gerrit.cloudera.org:8080/20505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2703e0790d5c25db98ed26f26f6d96281c366a3 Gerrit-Change-Number: 20505 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sun, 24 Sep 2023 22:38:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12357: Skip scheduling bloom filter from full-build scan
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20366 ) Change subject: IMPALA-12357: Skip scheduling bloom filter from full-build scan .. Patch Set 6: Code-Review+1 (1 comment) Revised patch set LGTM. http://gerrit.cloudera.org:8080/#/c/20366/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20366/4//COMMIT_MSG@29 PS4, Line 29: thus reducing the bloom filter building, : aggregation, and evalua > This patch does not differentiate between the locality of the filter target Done -- To view, visit http://gerrit.cloudera.org:8080/20366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I494533bc06da84e606cbd1ae161908089a5e Gerrit-Change-Number: 20366 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 29 Aug 2023 06:42:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12357: Skip scheduling bloom filter from full-build scan
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20366 ) Change subject: IMPALA-12357: Skip scheduling bloom filter from full-build scan .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/20366/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20366/4//COMMIT_MSG@22 PS4, Line 22: e any predicate filter > You mean planner add null checking predicate on join key column? I don't th Right, currently Impala does not add the IS NOT NULL predicate for the join columns. Hive does add this for inner joins (for outer joins it won't be applicable). http://gerrit.cloudera.org:8080/#/c/20366/4//COMMIT_MSG@29 PS4, Line 29: thus reducing the bloom filter aggregation : overhead in coordinator This applies to the remote bloom filters right ? For local filters that are not sent to the coordinator, this patch could still potentially help for the high fpp case. http://gerrit.cloudera.org:8080/#/c/20366/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20366/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@252 PS4, Line 252: private int rank_ = 1; Is 'rank' the right term here ? Normally, rank implies some type of comparison function. Perhaps 'level' is more appropriate to indicate level in a subtree. http://gerrit.cloudera.org:8080/#/c/20366/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@844 PS4, Line 844: Math.max(0.9 For testing purposes or for a backdoor in a real deployment, would it be useful to tune this ? Maybe in some cases we might want higher threshold or if there is a bug in the fpp estimation for any reason. I could also be convinced to not add yet another tuning option - so open to a counter argument. -- To view, visit http://gerrit.cloudera.org:8080/20366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I494533bc06da84e606cbd1ae161908089a5e Gerrit-Change-Number: 20366 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 26 Aug 2023 00:10:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12282: Refine correlation factor in AggregationNode
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20194 ) Change subject: IMPALA-12282: Refine correlation factor in AggregationNode .. Patch Set 2: Code-Review+2 LGTM. -- To view, visit http://gerrit.cloudera.org:8080/20194 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ffa9e82b83e7c0042bd918ac132668a47505688 Gerrit-Change-Number: 20194 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 14 Jul 2023 00:05:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12282: Refine correlation factor in AggregationNode
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20194 ) Change subject: IMPALA-12282: Refine correlation factor in AggregationNode .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20194/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20194/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@687 PS1, Line 687: Math.pow(queryOptions.getAgg_mem_correlation_factor(), nonLiteralExprCount); I missed one detail when reviewing the previous patch (IMPALA-11842) .. the exponent should be N-1 rather than N since the correlation factor is representing correlation between 2 columns. If N = 2 and correlation_factor = 0.2, we should multiply once by 0.2. If N.= 1, there shouldn't be a need to multiply by the correlation factor. What do you think ? -- To view, visit http://gerrit.cloudera.org:8080/20194 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ffa9e82b83e7c0042bd918ac132668a47505688 Gerrit-Change-Number: 20194 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jul 2023 23:10:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11842: Improve memory estimation for Aggregate
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20104 ) Change subject: IMPALA-11842: Improve memory estimation for Aggregate .. Patch Set 14: Code-Review+2 Carry forward +1's from Quanlong, Kurt, Abhishek, Aman and bumping to +2. -- To view, visit http://gerrit.cloudera.org:8080/20104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421 Gerrit-Change-Number: 20104 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jul 2023 01:14:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11842: Improve memory estimation for Aggregate
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20104 ) Change subject: IMPALA-11842: Improve memory estimation for Aggregate .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/20104/12/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20104/12/common/thrift/ImpalaService.thrift@819 PS12, Line 819: new > No, the LARGE_AGG_MEM_THRESHOLD will serve as a lower bound of new memory e Ack -- To view, visit http://gerrit.cloudera.org:8080/20104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421 Gerrit-Change-Number: 20104 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jul 2023 01:08:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11842: Improve memory estimation for Aggregate
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20104 ) Change subject: IMPALA-11842: Improve memory estimation for Aggregate .. Patch Set 12: Code-Review+1 (4 comments) Just a few comments below. Overall, looks good ! http://gerrit.cloudera.org:8080/#/c/20104/12/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20104/12/common/thrift/ImpalaService.thrift@816 PS12, Line 816: node nit: to be precise can we say 'aggregation node's memory estimate' is deemed .. http://gerrit.cloudera.org:8080/#/c/20104/12/common/thrift/ImpalaService.thrift@817 PS12, Line 817: node nit: same as above http://gerrit.cloudera.org:8080/#/c/20104/12/common/thrift/ImpalaService.thrift@819 PS12, Line 819: lower shouldn't this be 'higher' ? i.e new memory estimate will not be higher than the LARGE_AGG_MEM_THRESHOLD .. http://gerrit.cloudera.org:8080/#/c/20104/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20104/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@682 PS12, Line 682: aggInfo.getGroupingExprs().size() In some queries, we have seen several constant literals also included in the grouping columns. E.g GROUP BY col1, col2, 'abc', 'def' or for the ROLLUP case, there could be NULL: e.g GROUP BY col1, col2, NULL, NULL In such cases, the correlation adjustment should ideally only consider the actual columns, not the constants or NULLs. If you agree, it is ok to do this in a follow up patch since the tests are already passing and I don't think this should hold it up. -- To view, visit http://gerrit.cloudera.org:8080/20104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421 Gerrit-Change-Number: 20104 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 12 Jul 2023 23:20:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12183: Fix cardinality clamping across aggregation phases
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20009 ) Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases .. Patch Set 5: Code-Review+2 (1 comment) LGTM. Since existing comments have been addressed, bumping to +2. Thanks Riza for the patch. http://gerrit.cloudera.org:8080/#/c/20009/5/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test: http://gerrit.cloudera.org:8080/#/c/20009/5/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test@512 PS5, Line 512: mem > This is because prior to this patch, each agg class is capped with inputCar Yes, each agg class in the merge phase of the agg is consuming corresponding per-class rows from the first phase. Additional discussion here: https://gerrit.cloudera.org/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test#322 -- To view, visit http://gerrit.cloudera.org:8080/20009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95 Gerrit-Change-Number: 20009 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 11 Jun 2023 23:33:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12183: Fix cardinality clamping across aggregation phases
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20009 ) Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases .. Patch Set 5: Code-Review+1 (1 comment) LGTM. Will bump the vote once the tests are completed and also give other reviewers some time to review. http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@322 PS4, Line 322: 3373800 > I put some debugging logs and these are the num group estimates and the cap Sounds good. Just expanding a bit on the overall processing cost of this operator so we have a common understanding of this: - The UNION below the AGGREGATE #127 produces 276K rows. - AGGREGATE #127 has 5 agg classes - each class will internally process all 276K rows but due to the group-by each class produces <= 276K rows and the total output cardinality of AGGREGATE #127 is 562K rows. This corresponds to the processing cost of 8308890. i.e processing 276K * 5 = 1380K rows costs 8308890 - AGGREGATE #215 is consuming the already grouped output per-class. Class 0 consumes Class 0 coming from #127 etc. so applying the per class cardinality you get total cost 3373800 which is 40% of the cost of #127. -- To view, visit http://gerrit.cloudera.org:8080/20009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95 Gerrit-Change-Number: 20009 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 07 Jun 2023 18:03:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12183: Fix cardinality clamping across aggregation phases
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20009 ) Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@98 PS4, Line 98: the either of nit: either the input .. http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@104 PS4, Line 104: content nit: 'contain' http://gerrit.cloudera.org:8080/#/c/20009/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@253 PS4, Line 253: unknownEstimate = true; If we get an unknown estimate for any one aggregate class, even though the final cardinality will be returned as -1, we are continuing the iteration. Is it to initialize the num groups for each class ? Otherwise we could break out of the loop. http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/20009/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@322 PS4, Line 322: 3373800 Could you clarify how this cost changed to 1/5th of the original cost ? There are 5 aggregation classes so effectively the aggregation is working proportionally more. Similar changes are there in a few other tests. -- To view, visit http://gerrit.cloudera.org:8080/20009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95 Gerrit-Change-Number: 20009 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 07 Jun 2023 01:48:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12097: WITH CLAUSE should be skipped when optimizing COUNT(*) query on Iceberg table
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19811 ) Change subject: IMPALA-12097: WITH CLAUSE should be skipped when optimizing COUNT(*) query on Iceberg table .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b21cbea79be77f2ea8490bd7f7b2f62063eb0e4 Gerrit-Change-Number: 19811 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Apr 2023 21:14:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12097: WITH CLAUSE should be skipped when optimizing COUNT(*) query on Iceberg table
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19811 ) Change subject: IMPALA-12097: WITH CLAUSE should be skipped when optimizing COUNT(*) query on Iceberg table .. Patch Set 1: Code-Review+1 (3 comments) A few minor comments below. http://gerrit.cloudera.org:8080/#/c/19811/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19811/1//COMMIT_MSG@9 PS1, Line 9: start nit: 'star' instead of 'start' http://gerrit.cloudera.org:8080/#/c/19811/1//COMMIT_MSG@15 PS1, Line 15: * Add e2e tests Were you also planning to add a planner test for the WITH clause ? It would be useful to have 1 such test. http://gerrit.cloudera.org:8080/#/c/19811/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/19811/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1437 PS1, Line 1437: optimizePlainCountStarQuery nit: Since this function is specifically meant for Iceberg tables, can we rename this to optimizePlainCountStarQueryForIceberg() ? It is slightly verbose but makes the intent clearer to avoid confusion. -- To view, visit http://gerrit.cloudera.org:8080/19811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b21cbea79be77f2ea8490bd7f7b2f62063eb0e4 Gerrit-Change-Number: 19811 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 27 Apr 2023 19:01:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19682 ) Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts When an inner or outer join involves conjuncts such as the following: SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2 the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present but not for the above case, thus leading to over-estimation. The main change in this patch is an improvement to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts. A second change is the addition of a new query option which is a floating point number between 0 to 1 inclusive: join_selectivity_correlation_factor It defaults to 0.0 which preserves the existing behavior of using the Minimum selectivity of the conjuncts. Given multiple join conjuncts C1,C2..Cn having comparable selectivity, we sometimes see over-estimation by several orders of magnitude. Setting the above query option to a value higher than 0 first computes the product of the selectivities: sel(C1) * sel(C2) .. sel(Cn) and then scales it by dividing by the join_selectivity_correlation_factor. Note that this setting applies to all the joins in the query so it has it's trade offs. Another approach is to use per join hints but there are limitations of the hints approach too. Testing: - Added planner tests with a combination of outer and inner join with different values of join_selectivity_correlation_factor - Ran PlannerTest and TpcdsPlannerTest Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Reviewed-on: http://gerrit.cloudera.org:8080/19682 Tested-by: Impala Public Jenkins Reviewed-by: Kurt Deschler Reviewed-by: Quanlong Huang --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test 6 files changed, 297 insertions(+), 17 deletions(-) Approvals: Impala Public Jenkins: Verified Kurt Deschler: Looks good to me, but someone else must approve Quanlong Huang: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 9 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19682 ) Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java@496 PS6, Line 496: if (corrfactor > 0) cumulativeSel *= (((double) joinCard/lhsCard)/rhsCard); > This would be more readable as (double) joinCard/(lhsCard*rhsCard); Not doing (lhsCard * rhsCard) was intentional here and and other places. This was to avoid multiplication overflow for large values since both are long datatype. In a prior version of the patch I was doing a checkedMultiply() which checks for overflow but changed it to handle the lhsCard and rhsCard separately based on Csaba's comment. Regarding applying the correlation factor, I had a slightly different interpretation of this. If you have 4 equijoin conjuncts c1, c2, c3, c4 and each one's selectivity is s1, s2, s3, s4, the correlation factor is not necessarily representing a pair-wise correlation. If that was the case, we would get (s1 * s2)/CF for the first two then considering s3, we would get. (((s1 * s2)/CF) * s3)/CF and so on.. but c1 and c2 may not be correlated to each other at all. Only one of them might be correlated to c3. Having a single denominator allows us to set it to whatever factor we want to represent whether 2 or 3 or all 4 are correlated. It also makes it easier to articulate in the documentation. -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 10 Apr 2023 23:37:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19682 ) Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java@432 PS6, Line 432: cumulativeSel > nit: "cumulativeSel" Done http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java@487 PS6, Line 487: cumulativeSel > nit: "cumulativeSel" Done -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 10 Apr 2023 14:48:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Hello Quanlong Huang, Kurt Deschler, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19682 to look at the new patch set (#7). Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts When an inner or outer join involves conjuncts such as the following: SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2 the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present but not for the above case, thus leading to over-estimation. The main change in this patch is an improvement to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts. A second change is the addition of a new query option which is a floating point number between 0 to 1 inclusive: join_selectivity_correlation_factor It defaults to 0.0 which preserves the existing behavior of using the Minimum selectivity of the conjuncts. Given multiple join conjuncts C1,C2..Cn having comparable selectivity, we sometimes see over-estimation by several orders of magnitude. Setting the above query option to a value higher than 0 first computes the product of the selectivities: sel(C1) * sel(C2) .. sel(Cn) and then scales it by dividing by the join_selectivity_correlation_factor. Note that this setting applies to all the joins in the query so it has it's trade offs. Another approach is to use per join hints but there are limitations of the hints approach too. Testing: - Added planner tests with a combination of outer and inner join with different values of join_selectivity_correlation_factor - Ran PlannerTest and TpcdsPlannerTest Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test 6 files changed, 297 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/19682/7 -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19682 ) Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java@496 PS6, Line 496: if (corrfactor > 0) cumulative_sel *= (((double) joinCard/lhsCard)/rhsCard); > Should we consider corrfactor here as well so that multiple conditions in t On line 500 we are dividing the cumulative selectivity by the corrfactor. The idea is to multiply all the individual selectivities and divide only once. Let me know if you had some other intent. -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 10 Apr 2023 14:26:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19682 ) Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/19682/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/19682/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java@421 PS1, Line 421: slots.lhsNumRows(), slots.rhsNumRows(), lhsCard, rhsCard)); > Correct, the result will be somewhere between the multiplied selectivity an Done http://gerrit.cloudera.org:8080/#/c/19682/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/19682/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@414 PS3, Line 414: kState(!eqJoinC > Hitting the limit seems possible here with big RHS+LHS and could lead to we Ack. Changed it to divide by the lhsCard, rhsCard. http://gerrit.cloudera.org:8080/#/c/19682/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@434 PS3, Line 434: > This seems to be the same logic as in the loop above. I think that it would Refactored this. http://gerrit.cloudera.org:8080/#/c/19682/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@443 PS3, Line 443: if (corrfactor > 0) { > This could be moved to line 430. I think we still need to check the final result is >=0 since result is getting modified at a couple of places. http://gerrit.cloudera.org:8080/#/c/19682/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@487 PS3, Line 487: double cumulative_sel = 1.0; > Should we consider multiply the selectivity here too? (if the query option Yes, thanks for pointing this. Added the same logic here. -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 09 Apr 2023 06:25:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Hello Quanlong Huang, Kurt Deschler, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19682 to look at the new patch set (#6). Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts When an inner or outer join involves conjuncts such as the following: SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2 the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present but not for the above case, thus leading to over-estimation. The main change in this patch is an improvement to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts. A second change is the addition of a new query option which is a floating point number between 0 to 1 inclusive: join_selectivity_correlation_factor It defaults to 0.0 which preserves the existing behavior of using the Minimum selectivity of the conjuncts. Given multiple join conjuncts C1,C2..Cn having comparable selectivity, we sometimes see over-estimation by several orders of magnitude. Setting the above query option to a value higher than 0 first computes the product of the selectivities: sel(C1) * sel(C2) .. sel(Cn) and then scales it by dividing by the join_selectivity_correlation_factor. Note that this setting applies to all the joins in the query so it has it's trade offs. Another approach is to use per join hints but there are limitations of the hints approach too. Testing: - Added planner tests with a combination of outer and inner join with different values of join_selectivity_correlation_factor - Ran PlannerTest and TpcdsPlannerTest Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test 6 files changed, 297 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/19682/6 -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Hello Quanlong Huang, Kurt Deschler, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19682 to look at the new patch set (#5). Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts When an inner or outer join involves conjuncts such as the following: SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2 the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present but not for the above case, thus leading to over-estimation. The main change in this patch is an improvement to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts. A second change is the addition of a new query option which is a floating point number between 0 to 1 inclusive: join_selectivity_correlation_factor It defaults to 0.0 which preserves the existing behavior of using the Minimum selectivity of the conjuncts. Given multiple join conjuncts C1,C2..Cn having comparable selectivity, we sometimes see over-estimation by several orders of magnitude. Setting the above query option to a value higher than 0 first computes the product of the selectivities: sel(C1) * sel(C2) .. sel(Cn) and then scales it by dividing by the join_selectivity_correlation_factor. Note that this setting applies to all the joins in the query so it has it's trade offs. Another approach is to use per join hints but there are limitations of the hints approach too. Testing: - Added planner tests with a combination of outer and inner join with different values of join_selectivity_correlation_factor - Ran PlannerTest and TpcdsPlannerTest Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test 6 files changed, 298 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/19682/5 -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Hello Quanlong Huang, Kurt Deschler, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19682 to look at the new patch set (#4). Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts When an inner or outer join involves conjuncts such as the following: SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2 the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present but not for the above case, thus leading to over-estimation. The main change in this patch is an improvement to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts. A second change is the addition of a new query option which is a floating point number between 0 to 1 inclusive: join_selectivity_correlation_factor It defaults to 0.0 which preserves the existing behavior of using the Minimum selectivity of the conjuncts. Given multiple join conjuncts C1,C2..Cn having comparable selectivity, we sometimes see over-estimation by several orders of magnitude. Setting the above query option to a value higher than 0 first computes the product of the selectivities: sel(C1) * sel(C2) .. sel(Cn) and then scales it by dividing by the join_selectivity_correlation_factor. Note that this setting applies to all the joins in the query so it has it's trade offs. Another approach is to use per join hints but there are limitations of the hints approach too. Testing: - Added planner test with a combination of outer and inner join with different values of join_selectivity_correlation_factor - Ran PlannerTest and TpcdsPlannerTest Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test 6 files changed, 294 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/19682/4 -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19682 ) Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/19682/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19682/3//COMMIT_MSG@30 PS3, Line 30: consider per join hints but in a complex query (or a view that is > It might also be hard to tune the query option when the query has hundreds Yes, there are tradeoffs in both the query option approach and the hints approach. Join cardinality estimation is a hard problem and we do need to look at better ways of doing this in the future using learning models or other techniques. Of course, better stats will always help. In terms of hints, couple of things to point out: 1. In a lot of cases, the queries may come from a BI tool and are generated queries instead of hand written and users may not have control over setting the hints. 2. For views: CREATE VIEW v1 AS …. t1 inner join t2 on a1 = a2 AND b1 = b2 q1: select * from v1 WHERE q2: select * from v1 WHERE The predicates p1, p2 may be pushed down below the join/one side of the join or both sides .. also the behavior of pushdown depends on inner vs outer joins. The static hints at the join level level will not be aware of such transformations. We could use both the query option and hints for more flexibility. -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 07 Apr 2023 03:42:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19682 ) Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/19682/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/19682/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java@330 PS1, Line 330: return getGenericJoinCardinality(eqJoinConjunctSlots, otherEqJoinStats, lhsCard, > line too long (95 > 90) Done http://gerrit.cloudera.org:8080/#/c/19682/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java@421 PS1, Line 421: if (analyzer.getQueryOptions().isUse_multiplied_selectivity_for_join()) { > +1 for making this more flexible Thanks for the suggestion. So conceptually a coefficient which is less than 1.0 (and > 0) would mean that the combined selectivity: (sel1 x sel2 x 1/coefficient) is *higher* than what the current formula that does the multiplication of selectivity would compute. I doubt we will need to go in the other direction, i.e lower than the product of the selectivities. Let me modify this query option. -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 06 Apr 2023 00:58:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Hello Quanlong Huang, Kurt Deschler, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19682 to look at the new patch set (#2). Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts When an inner or outer join involves conjuncts such as the following: SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2 the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present but not for the above case, thus leading to over-estimation. The main change in this patch is an improvement to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts. A second change is the addition of a new boolean query option: use_multiplied_selectivity_for_join It defaults to false. Given multiple independent join conjuncts C1,C2..Cn having comparable selectivity, we sometimes see over-estimation by several orders of magnitude. Setting the above query option to TRUE allows the individual selectivities (fraction between 0 and 1) of the conjuncts to be multiplied instead of taking the minimum. Note that this applies to all the joins in the query. In the future we can consider per join hints but in a complex query (or a view that is shared among users), it is cumbersome to identify exactly where to set hints. Testing: - Added planner test with a combination of outer and inner join (with use_multiplied_selectivity_for_join enabled and disabled) - Ran PlannerTest and TpcdsPlannerTest TODO: run additional tests Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test 6 files changed, 139 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/19682/2 -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts
Aman Sinha has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19682 Change subject: IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts .. IMPALA-12006: Improve cardinality estimation for joins involving multiple conjuncts When an inner or outer join involves conjuncts such as the following: SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2 the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present but not for the above case, thus leading to over-estimation. The main change in this patch is an improvement to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts. A second change is the addition of a new boolean query option: use_multiplied_selectivity_for_join It defaults to false. Given multiple independent join conjuncts C1,C2..Cn having comparable selectivity, we sometimes see over-estimation by several orders of magnitude. Setting the above query option to TRUE allows the individual selectivities (fraction between 0 and 1) of the conjuncts to be multiplied instead of taking the minimum. Note that this applies to all the joins in the query. In the future we can consider per join hints but in a complex query (or a view that is shared among users), it is cumbersome to identify exactly where to set hints. Testing: - Added planner test with a combination of outer and inner join (with use_multiplied_selectivity_for_join enabled and disabled) - Ran PlannerTest and TpcdsPlannerTest TODO: run additional tests Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test 6 files changed, 137 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/19682/1 -- To view, visit http://gerrit.cloudera.org:8080/19682 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35 Gerrit-Change-Number: 19682 Gerrit-PatchSet: 1 Gerrit-Owner: Aman Sinha
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19537 ) Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. IMPALA-11946: Add Thrift HTTP support for external frontend - Add enable_external_fe_http flag that defaults to false - When true the external frontend service (external_fe_port) will expect clients to use http transport. - When false the external frontend service will expect binary transport. - Add tests for basic external frontend functionality - Add test to ensure the non-external frontend services do not expose the ExecutePlannedStatement interface. Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Reviewed-on: http://gerrit.cloudera.org:8080/19537 Tested-by: Impala Public Jenkins Reviewed-by: Aman Sinha --- M be/src/service/impala-server.cc M fe/src/test/java/org/apache/impala/customcluster/CustomClusterRunner.java A fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java 3 files changed, 220 insertions(+), 7 deletions(-) Approvals: Impala Public Jenkins: Verified Aman Sinha: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 6 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19537 ) Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 5 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Comment-Date: Thu, 09 Mar 2023 02:02:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
Aman Sinha has removed a vote on this change. Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19537 ) Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. Patch Set 4: Verified+1 The 2 test failures in the first run are known flaky tests. The re-triggered run https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7105/. was successful. -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Comment-Date: Wed, 08 Mar 2023 20:04:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19537 ) Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java File fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java: http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java@22 PS2, Line 22: import org.apache.hive.service.rpc.thrift.TCloseSessionReq; > Done I thought that such imports would fail checkstyle but perhaps checkstyle is not enforced for the tests. The FE source code does not have wildcard imports. In any case you have made the change so all good ! -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 3 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Comment-Date: Mon, 06 Mar 2023 22:17:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19537 ) Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. Patch Set 2: Code-Review+1 (2 comments) It would be a useful enhancement. Code changes LGTM (couple of nits). Any thoughts on the potential performance latency comparison HTTP is used vs binary ? I assume there are existing thrift-over-http vs thrift-over-binary published benchmarks. http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java File fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java: http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java@22 PS2, Line 22: import org.apache.hive.service.rpc.thrift.*; nit: I think the wildcard is not allowed for imports. http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java@46 PS2, Line 46: hs2Port nit: maybe call this hs2BinaryPort to easily distinguish with the other one. -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 2 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 06 Mar 2023 20:37:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11960: Fix constant propagation from TIMESTAMP to DATE
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19572 ) Change subject: IMPALA-11960: Fix constant propagation from TIMESTAMP to DATE .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19572 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1938bf5e91057b220daf8a1892940f674aac3d68 Gerrit-Change-Number: 19572 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 03 Mar 2023 15:41:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11960: Fix constant propagation from TIMESTAMP to DATE
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19572 ) Change subject: IMPALA-11960: Fix constant propagation from TIMESTAMP to DATE .. Patch Set 2: Code-Review+1 (2 comments) Thanks Csaba for fixing this. The code change LGTM. One suggestion on adding a test case. http://gerrit.cloudera.org:8080/#/c/19572/2/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: http://gerrit.cloudera.org:8080/#/c/19572/2/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test@439 PS2, Line 439: with dp_view as This test is good. Could you also add one variant of this which has a mix of >= and < in the WHERE clause: e.g where timestamp_col >= '2009-01-01 01:00:00' AND timestamp_col < '2009-02-01 01:00:00' After the rewrite the >= is expected to be preserved and the < is going to change to <=. This is just a sanity check. http://gerrit.cloudera.org:8080/#/c/19572/2/testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test File testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test: http://gerrit.cloudera.org:8080/#/c/19572/2/testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test@14 PS2, Line 14: select count(*), sum(int_col) from alltypes_date_partition Same suggestion as above for the end-to-end test. -- To view, visit http://gerrit.cloudera.org:8080/19572 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1938bf5e91057b220daf8a1892940f674aac3d68 Gerrit-Change-Number: 19572 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 02 Mar 2023 22:43:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 13: > Patch Set 13: > > (1 comment) > > > Patch Set 13: > > > > (1 comment) The problem is that it would not be practical to check the block locations for for potential relocations when doing the query planning. Given N blocks in a bucket for one table and M blocks for the second table, it would be O(N+M) time to decide which distribution method to use. This would add up depending on the number of joins in the query. We really want to 'pin' the location but AFAIK HDFS does not allow us to do that. Other systems such as MemSQL that do bucket join don't have to worry about this since the data is memory resident. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 22 Feb 2023 08:21:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11925: Added a new api add write notification log in batch in the MetastoreServiceHandler class
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19508 ) Change subject: IMPALA-11925: Added a new api add_write_notification_log_in_batch in the MetastoreServiceHandler class .. Patch Set 1: Code-Review+1 Changes lgtm. Could you address the various formatting comments ? -- To view, visit http://gerrit.cloudera.org:8080/19508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9a35557c2ed79ed0276c4a418b5292fc6dd6194 Gerrit-Change-Number: 19508 Gerrit-PatchSet: 1 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 17 Feb 2023 20:49:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11895: Need accessor methods for third party extension
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19469 ) Change subject: IMPALA-11895: Need accessor methods for third party extension .. IMPALA-11895: Need accessor methods for third party extension Changed the visibility of some methods needed for a third party extension. The methods included in this are for the statements 'compute stats', 'create function', and 'refresh/invalidate'. Change-Id: Iaea15ef8a96c9509b9cd79df595868fd1db47e83 Reviewed-on: http://gerrit.cloudera.org:8080/19469 Reviewed-by: Aman Sinha Tested-by: Aman Sinha --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java 4 files changed, 24 insertions(+), 3 deletions(-) Approvals: Aman Sinha: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/19469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iaea15ef8a96c9509b9cd79df595868fd1db47e83 Gerrit-Change-Number: 19469 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Steve Carlin
[Impala-ASF-CR] IMPALA-11895: Need accessor methods for third party extension
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19469 ) Change subject: IMPALA-11895: Need accessor methods for third party extension .. Patch Set 2: Verified+1 Code-Review+2 Carry forward Kurt and mine +1 and bumping to +2. Also carry forward the Verified +1 since there hasn't been code change since the last run. Will wait for a few hours before merging though in case other reviewers have further comment. -- To view, visit http://gerrit.cloudera.org:8080/19469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaea15ef8a96c9509b9cd79df595868fd1db47e83 Gerrit-Change-Number: 19469 Gerrit-PatchSet: 2 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Mon, 13 Feb 2023 20:05:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11895: Need accessor methods for third party extension
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19469 ) Change subject: IMPALA-11895: Need accessor methods for third party extension .. Patch Set 2: > Patch Set 1: > > One extra note which I should have included the first time. > > Another potential solution would be to copy the PartitionSpec in the get > method. In this case, it might work...unless some other object is a member > of PartitionSpec in which case it would have to be a deepCopy (and > PartitionSpec then has the responsibility to make sure it does the deep > copy). I just don't like the "mixed message" architecture approach since > this isn't done elsewhere and it is better to make PartitionSpec an immutable > object. The first time analyze() is called, the partitionSpec_ is marked Immutable according to: https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java#L146 -- To view, visit http://gerrit.cloudera.org:8080/19469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaea15ef8a96c9509b9cd79df595868fd1db47e83 Gerrit-Change-Number: 19469 Gerrit-PatchSet: 2 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Mon, 13 Feb 2023 19:59:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11895: Need accessor methods for third party extension
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19469 ) Change subject: IMPALA-11895: Need accessor methods for third party extension .. Patch Set 1: Code-Review+1 (2 comments) > Patch Set 1: > > (1 comment) > > Can we be sure that we don't run into problems if we expose these > methods/variables? Won't users be able to change the state in ways the code > doesn't expect? Probably @scarlin can respond to this but I only see accessors have been added or changed so none of them modify state. Carry forward Kurt's +1. Can bump to +2 once comments are addressed. http://gerrit.cloudera.org:8080/#/c/19469/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19469/1//COMMIT_MSG@9 PS1, Line 9: viewability Should this be 'visibility' instead of 'viewability' ? http://gerrit.cloudera.org:8080/#/c/19469/1//COMMIT_MSG@9 PS1, Line 9: Changed the viewability of some methods needed for a third > It would be more informative to list the methods that have been changed. Instead of individual method names, you could mention that the accessor methods are related to COMPUTE STATS, CREATE FUNCTION / TABLE, REFRESH commands. -- To view, visit http://gerrit.cloudera.org:8080/19469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaea15ef8a96c9509b9cd79df595868fd1db47e83 Gerrit-Change-Number: 19469 Gerrit-PatchSet: 1 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Sat, 11 Feb 2023 00:32:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG@25 PS13, Line 25: based on hdfs storage are supported. Thanks for the detailed patch. I have a high level question about the physical co-location for bucketed tables. HDFS supports re-balancing of data which moves data across nodes in a cluster for more uniform distribution [1]. If a table is bucketed and the re-balancing occurred after the table was created, the bucket shuffle hash join will produce incorrect results. In such cases, we would want to not pick this join method and fall back to the regular distributed join. Is there a consideration for such scenario in this patch ? [1] https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/scaling-namespaces/topics/hdfs-balancing-data-across-hdfs-cluster.html -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 08 Feb 2023 02:51:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11843: Fix IndexOutOfBoundsException in analytic limit pushdown
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19422 ) Change subject: IMPALA-11843: Fix IndexOutOfBoundsException in analytic limit pushdown .. Patch Set 1: Code-Review+2 LGTM. Thanks for the patch. I am surprised we did not already have such a test case with mixed conjuncts. -- To view, visit http://gerrit.cloudera.org:8080/19422 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d6bd58be58d09a29b8b81972e76665f41cf103 Gerrit-Change-Number: 19422 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 16 Jan 2023 19:27:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11808: Add support for reload event in catalogD
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19378 ) Change subject: IMPALA-11808: Add support for reload event in catalogD .. IMPALA-11808: Add support for reload event in catalogD This patch supports a new event called reload event in catalogD. This event is used to update table/file metadata for a refresh or invalidate command in other replicas of catalogDs when one of the replicas issued refresh/invalidate command. HIVE-26838 is the Hive jira that adds support for this reload event. This feature is disabled by default using a config enable_reload_events. To use this feature set this config to true and impala will be able to fire reload events. The processing of reload events in the event processor is always enabled. There is an end-to-end test added for this feature which currently checks firing/creation of the reload event and self-event check in the event processor. TODO: end-to-end test should also test this reload event in the event processor. There is also a follow up jira IMPALA-11822 to track the optimization patch for this feature. Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136 Reviewed-on: http://gerrit.cloudera.org:8080/19378 Tested-by: Aman Sinha Reviewed-by: Andrew Sherman Reviewed-by: Aman Sinha --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 9 files changed, 337 insertions(+), 2 deletions(-) Approvals: Aman Sinha: Looks good to me, approved; Verified Andrew Sherman: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/19378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136 Gerrit-Change-Number: 19378 Gerrit-PatchSet: 17 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-11808: Add support for reload event in catalogD
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19378 ) Change subject: IMPALA-11808: Add support for reload event in catalogD .. Patch Set 16: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136 Gerrit-Change-Number: 19378 Gerrit-PatchSet: 16 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 06 Jan 2023 00:38:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11808: Add support for reload event in catalogD
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19378 ) Change subject: IMPALA-11808: Add support for reload event in catalogD .. Patch Set 16: Verified+1 Code-Review+1 Latest patchset LGTM. Carry forward previous +1 from Quanlong and Csaba and adding mine. Will bump to +2 after Andrew has another look. Also carry forward previous Verified +1 since the latest changes are not affecting the code functionality. -- To view, visit http://gerrit.cloudera.org:8080/19378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136 Gerrit-Change-Number: 19378 Gerrit-PatchSet: 16 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 05 Jan 2023 23:26:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11808: Add support for reload event in catalogD
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19378 ) Change subject: IMPALA-11808: Add support for reload event in catalogD .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/19378/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19378/13//COMMIT_MSG@10 PS13, Line 10: update nit: to update .. http://gerrit.cloudera.org:8080/#/c/19378/13//COMMIT_MSG@11 PS13, Line 11: one the nit: one of the .. -- To view, visit http://gerrit.cloudera.org:8080/19378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136 Gerrit-Change-Number: 19378 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 05 Jan 2023 21:41:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11808: Add support for reload event in catalogD
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19378 ) Change subject: IMPALA-11808: Add support for reload event in catalogD .. Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/19378/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19378/12//COMMIT_MSG@13 PS12, Line 13: addes nit: adds http://gerrit.cloudera.org:8080/#/c/19378/12//COMMIT_MSG@14 PS12, Line 14: disable nit: disabled http://gerrit.cloudera.org:8080/#/c/19378/12//COMMIT_MSG@17 PS12, Line 17: reload nit: 'of' reload events.. -- To view, visit http://gerrit.cloudera.org:8080/19378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136 Gerrit-Change-Number: 19378 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 05 Jan 2023 21:16:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11626, IMPALA-11808: Bump GBN to 36109364 for new HMS events
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19399 ) Change subject: IMPALA-11626, IMPALA-11808: Bump GBN to 36109364 for new HMS events .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I71dd1bf7b6d744f23008d8cefeeff7adcc8afa41 Gerrit-Change-Number: 19399 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 05 Jan 2023 00:49:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11812: Deduplicate column schema in hmsPartitions
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19391 ) Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py File tests/custom_cluster/test_wide_table_operations.py: http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py@36 PS6, Line 36: if cls.exploration_strategy() != 'exhaustive': > Yeah, the progress is helpful so I added some progress logs for catalogd. Ack -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 30 Dec 2022 15:48:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11812: Deduplicate column schema in hmsPartitions
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19391 ) Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1862 PS6, Line 1862: fetchPartitionsByName > nit. This method probably should be renamed to fetchPartitionsByTable() due I think the name is ok since the 'ByName' refers to the names of the partitions rather than the table. There's a corresponding underlying HMS api called get_partitions_by_name() so this is a higher level wrapper for that. http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@208 PS6, Line 208: LOG.info("Fetched {}/{} partitions for table {}.{}", numDone, partNames.size(), : msTbl.getDbName(), msTbl.getTableName()); > This probably will log number of progressive message Actually, the number of logged messages would be partNames.size()/maxPartitionsPerRpc_ . If there are 5000 partitions to fetch and the batch size per RPC is 1000 partitions, then there will be 5 such messages. If the batch size is too small, yes it will generate more messages but in general it is not a bad idea to track the progress. I am ok either way but will let Quanlong respond. http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py File tests/custom_cluster/test_wide_table_operations.py: http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py@36 PS6, Line 36: if cls.exploration_strategy() != 'exhaustive': This is a good test case and given the high runtime it makes sense to restrict this to exhaustive run only. I am wondering if there is any intermediate message that should be displayed to the console ..e.g after each of the recover_stmt, invalidate_stmt, show_parts_stmt execute, a success message could be shown. Also, how long does table creation itself take ? Developers may want to know if things are stuck or progressing. However, this is a nice to have, not must do. -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 29 Dec 2022 20:39:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11812: Deduplicate column schema in hmsPartitions
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19391 ) Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions .. Patch Set 4: Code-Review+1 Thanks for making the code changes. I will bump up to +2 once the tests are added. -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 28 Dec 2022 19:18:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11812: Deduplicate column list in hmsPartitions
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19391 ) Change subject: IMPALA-11812: Deduplicate column list in hmsPartitions .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/19391/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19391/3//COMMIT_MSG@15 PS3, Line 15: interned. nit: comma instead of period http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1842 PS3, Line 1842: msTbl.getPartitionKeys().size() On line 1828 this value was stored as numClusteringCols_ so we could use that. http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1877 PS3, Line 1877: dereference nit: did you mean deduplicate instead of dereference ? http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@210 PS3, Line 210: cols = deduplicateColumnList(partitions, cols); I am wondering why we would pass in a null cols list to begin with. Could we use the table level columns similar to what was done for addHmsPartitions() (assuming the table is not an incomplete table) ? Table tbl = catalog_.getTable(dbName, tblName); cols = tbl.getMetaStoreTable().getSd().getCols() http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@224 PS3, Line 224: public static List deduplicateColumnList(List partitions, Since this method is actually force-setting the columns list for the partition, I think 'deduplicate' is not the best name. It gives the impression that a duplicate elimination is being done. Also, same comment as earlier .. what are the situations where we expect a null cols list ? Are there cases where we don't have the table level schema and instead have to use the first partition's schema ? http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@229 PS3, Line 229: p.getSd().getCols() Could this also return null ? If so, given a set of partitions, we would have first N partitions set to null and once the first non-null is found then the rest of the partitions would be set to that. If we want all of them to be set to non-null, we could do 2 passes. -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 27 Dec 2022 16:18:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11494: Don't always produce Ranger audit log for authorized query
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18850 ) Change subject: IMPALA-11494: Don't always produce Ranger audit log for authorized query .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/18850/1/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java: http://gerrit.cloudera.org:8080/#/c/18850/1/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@200 PS1, Line 200: if (authzOk && !analysisOk) auditHandler.getAuthzEvents().clear(); If this condition is met, is it necessary to call flush() below ? -- To view, visit http://gerrit.cloudera.org:8080/18850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I701652e457d3118f43249e83be933713b17ce48f Gerrit-Change-Number: 18850 Gerrit-PatchSet: 1 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 16 Aug 2022 00:16:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11494: Don't always produce Ranger audit log for authorized query
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18850 ) Change subject: IMPALA-11494: Don't always produce Ranger audit log for authorized query .. Patch Set 3: Code-Review+1 +1 from my side. Will wait for Csaba's response before bumping to +2 -- To view, visit http://gerrit.cloudera.org:8080/18850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I701652e457d3118f43249e83be933713b17ce48f Gerrit-Change-Number: 18850 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 16 Aug 2022 21:56:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11443: Fix partitoned top-n with -1 input cardinality
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18744 ) Change subject: IMPALA-11443: Fix partitoned top-n with -1 input cardinality .. Patch Set 1: Code-Review+1 (1 comment) The code change lgtm. One comment about the test. http://gerrit.cloudera.org:8080/#/c/18744/1/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test File testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test: http://gerrit.cloudera.org:8080/#/c/18744/1/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test@858 PS1, Line 858: # Relies on functional.alltypessmall not having stats and running this test with Can we always rely on functional.alltypessmall not having stats ? FWIW, PlannerTest.testKudu() creates a test-only external table without stats for a particular test. Something similar could be an option. I am ok with the alltypessmall if we have used it elsewhere (or similar tables) with the no stats assumption. -- To view, visit http://gerrit.cloudera.org:8080/18744 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e2dc7df397dc4684e6510a200f381a74c8dd984 Gerrit-Change-Number: 18744 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 18 Jul 2022 20:42:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9670: Fix unloaded views are shown as tables for GET TABLES requests
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18626 ) Change subject: IMPALA-9670: Fix unloaded views are shown as tables for GET_TABLES requests .. Patch Set 4: Code-Review+2 LGTM. Thanks for fixing this. -- To view, visit http://gerrit.cloudera.org:8080/18626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I528bb20272ebdd66a0118c30efc2b0566f2b0e2f Gerrit-Change-Number: 18626 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 22 Jun 2022 22:44:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11284: Enable expression rewrites for || and Between predicate in values clause
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18581 ) Change subject: IMPALA-11284: Enable expression rewrites for || and Between predicate in values clause .. Patch Set 3: Code-Review+2 Carry both +1s and bumping to +2. -- To view, visit http://gerrit.cloudera.org:8080/18581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I99b8b33bf6468d12b9e26f0a6e744feb7072619c Gerrit-Change-Number: 18581 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 07 Jun 2022 05:34:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11284: Enable expression rewrites for || and Between predicate in values clause
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18581 ) Change subject: IMPALA-11284: Enable expression rewrites for || and Between predicate in values clause .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I99b8b33bf6468d12b9e26f0a6e744feb7072619c Gerrit-Change-Number: 18581 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 07 Jun 2022 05:34:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11323: Don't evaluate constants-only inferred predicates
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18579 ) Change subject: IMPALA-11323: Don't evaluate constants-only inferred predicates .. IMPALA-11323: Don't evaluate constants-only inferred predicates IMPALA-10182 fixed the problem of creating inferred predicates when both sides of an equality predicate came from the same slot. Inferred predicates also should not be created when both sides of an equality predicate are constant values which do not have scan slots. Change-Id: If1cd4559dda406d2d38703ed594b70b41ed336fd Reviewed-on: http://gerrit.cloudera.org:8080/18579 Tested-by: Impala Public Jenkins Reviewed-by: Aman Sinha --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M testdata/workloads/functional-query/queries/QueryTest/inline-view.test 3 files changed, 62 insertions(+), 14 deletions(-) Approvals: Impala Public Jenkins: Verified Aman Sinha: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/18579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If1cd4559dda406d2d38703ed594b70b41ed336fd Gerrit-Change-Number: 18579 Gerrit-PatchSet: 5 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin
[Impala-ASF-CR] IMPALA-11323: Don't evaluate constants-only inferred predicates
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18579 ) Change subject: IMPALA-11323: Don't evaluate constants-only inferred predicates .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1cd4559dda406d2d38703ed594b70b41ed336fd Gerrit-Change-Number: 18579 Gerrit-PatchSet: 4 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Sat, 04 Jun 2022 03:27:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11323: Don't evaluate constants-only inferred predicates
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18579 ) Change subject: IMPALA-11323: Don't evaluate constants-only inferred predicates .. Patch Set 3: Code-Review+1 (1 comment) Thanks for making the changes. Code changes LGTM. One other thought was about the planner test (similar to what was added in https://gerrit.cloudera.org/c/16917/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test). However, considering that you have the e2e test coverage, I am good with it. http://gerrit.cloudera.org:8080/#/c/18579/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18579/3//COMMIT_MSG@9 PS3, Line 9: IMPALA-101822 fixed the problem of creating inferred predicates when Still not the right jira id :) -- To view, visit http://gerrit.cloudera.org:8080/18579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1cd4559dda406d2d38703ed594b70b41ed336fd Gerrit-Change-Number: 18579 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Fri, 03 Jun 2022 19:23:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11323: Do not infer predicates on equivalent values
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18579 ) Change subject: IMPALA-11323: Do not infer predicates on equivalent values .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/18579/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18579/2//COMMIT_MSG@7 PS2, Line 7: Do not infer predicates on equivalent values nit: We are still creating the inferred predicate (which is why isInferred() for the binary predicate returns true) but we are not going to evaluate it as part of the SELECT node. You could say 'Don't evaluate constants-only inferred predicates' or something similar. http://gerrit.cloudera.org:8080/#/c/18579/2//COMMIT_MSG@9 PS2, Line 9: IMPALA-10812 This is not the right jira id http://gerrit.cloudera.org:8080/#/c/18579/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/18579/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1645 PS2, Line 1645:* Returns the source expression for this expression. nit: could you fix the alignment at the end of these 2 lines. http://gerrit.cloudera.org:8080/#/c/18579/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1660 PS2, Line 1660: if (slotDesc.isScanSlot()) { nit: Impala code style put trivial single statement blocks in the same line i.e if (slotDesc.isScanSlot()) return slotRef; http://gerrit.cloudera.org:8080/#/c/18579/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1677 PS2, Line 1677: public SlotDescriptor findSrcScanSlot() { Is findSrcScanSlot() getting used anywhere else after this change ? http://gerrit.cloudera.org:8080/#/c/18579/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/18579/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@588 PS2, Line 588: SlotRef lhsSlotRef = lhsSrcExpr.unwrapSlotRef(false); nit: it seems that unwrap is getting called multiple times ..once inside findSrcExpr and then here. Could you add comments here about the reason. http://gerrit.cloudera.org:8080/#/c/18579/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@592 PS2, Line 592: if (lhsSlotRef.getDesc() != null && A slot ref would always have non-null slot descriptor. http://gerrit.cloudera.org:8080/#/c/18579/2/testdata/workloads/functional-query/queries/QueryTest/inline-view.test File testdata/workloads/functional-query/queries/QueryTest/inline-view.test: http://gerrit.cloudera.org:8080/#/c/18579/2/testdata/workloads/functional-query/queries/QueryTest/inline-view.test@585 PS2, Line 585: with t as (select 1 a), v as nit: remove whitespace at the end of this and lines below -- To view, visit http://gerrit.cloudera.org:8080/18579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1cd4559dda406d2d38703ed594b70b41ed336fd Gerrit-Change-Number: 18579 Gerrit-PatchSet: 2 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 03 Jun 2022 08:25:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11274: CNF Rewrite causes a regress in join node performance
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18458 ) Change subject: IMPALA-11274: CNF Rewrite causes a regress in join node performance .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/18458/13/testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test File testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test: http://gerrit.cloudera.org:8080/#/c/18458/13/testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test@439 PS13, Line 439: # IMPALA-11274: Simple arithmetic expressions should still be eligible for CNF This predicate is already in CNF form so the rule would be a no-op. Was there a specific reason to add this test ? If the intent is to test. CNF for predicates with arithmetic exprs, you can add OR predicates along the patterns of preceding examples and substitute one or more exprs with arithmetic exprs. -- To view, visit http://gerrit.cloudera.org:8080/18458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326406c6b004fe31ec0e2a2f390a3845b8925aa9 Gerrit-Change-Number: 18458 Gerrit-PatchSet: 13 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 20 May 2022 23:18:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11301: Fix extreme != selectivity for NDV=1
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18543 ) Change subject: IMPALA-11301: Fix extreme != selectivity for NDV=1 .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6 Gerrit-Change-Number: 18543 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 19 May 2022 17:57:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11301: Fix extreme != selectivity for NDV=1
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18543 ) Change subject: IMPALA-11301: Fix extreme != selectivity for NDV=1 .. Patch Set 3: Code-Review+1 Latest changes lgtm. Only other thought was about the unit test itself. It relies on the current behavior of row_number ndv estimate. Were you thinking of adding any other test against a table whose column had ndv=1 ? -- To view, visit http://gerrit.cloudera.org:8080/18543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6 Gerrit-Change-Number: 18543 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 19 May 2022 16:31:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11301: Fix extreme = and != selectivity for NDV=1
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18543 ) Change subject: IMPALA-11301: Fix extreme = and != selectivity for NDV=1 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@267 PS1, Line 267: if (distinctValues == 1) distinctValues = 2; > Yeah the NOT col = 1 is a good use case to justify this. I am good with i Hmm..not sure what I meant by the last sentence in my comment above. Pls ignore that part. Looking at the Jenkins test output, I see several diffs in the hash join cardinality estimates for TPC-DS queries. I was worried about plan changes but wasn't expecting it for TPC-DS since I doubt there are columns with ndv=1. However, what might be happening is that the ndv gets adjusted based on filtering that was applied earlier in the plan. A join's cardinality estimate formula leverages the ndv coming from the child which may have been adjusted (reduced). We could look at all the plan diffs but it would need some time to validate functionally and with a performance run. Alternatively, we could limit the scope of this fix to the != predicates. -- To view, visit http://gerrit.cloudera.org:8080/18543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6 Gerrit-Change-Number: 18543 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 19 May 2022 02:26:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11301: Fix extreme = and != selectivity for NDV=1
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18543 ) Change subject: IMPALA-11301: Fix extreme = and != selectivity for NDV=1 .. Patch Set 2: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@267 PS1, Line 267: if (distinctValues == 1) distinctValues = 2; > Yes, and I do think that 0.5 is "more right" than 1. Yeah the NOT col = 1 is a good use case to justify this. I am good with it. In any case the cardinality will be rounded to the next integer value. http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@273 PS1, Line 273: if (op_ == Operator.DISTINCT_FROM && rChildIsNull) { > I have changed the patch to only affect ndv=1. Thanks for making the change to do an explicit check. The presence or absence of a value is not possible to be handled through NDV alone. If we had Histogram or Most Common Values, those keep track of actual values which would help. Marking this resolved. -- To view, visit http://gerrit.cloudera.org:8080/18543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6 Gerrit-Change-Number: 18543 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 18 May 2022 22:37:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: IMPALA-11301: Fix = and != selectivity for very low NDVs
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18543 ) Change subject: WIP: IMPALA-11301: Fix = and != selectivity for very low NDVs .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@267 PS1, Line 267: selectivity_ = 1.0 / (distinctValues + 1); If NDV == 1, this would now produce 0.5 instead of 1 which is not right. I suspect this change will change a lot of plans especially for predicates on boolean columns (ndv = 2). IMHO It would be better to restrict the fix to the non-equality predicates. http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@273 PS1, Line 273: selectivity_ = 1.0 - 1.0 / (distinctValues + 1); Since the distinctValues == 1 is a special case which causes 0 to be produced with the old formula, could we just explicitly check for distinctValues == 1 and only in that case use the new formula ? For the other values, old formula should still be ok right ? Just want to see if a minimal change can be done since formula changes have to go through much more testing. -- To view, visit http://gerrit.cloudera.org:8080/18543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6 Gerrit-Change-Number: 18543 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 18 May 2022 17:21:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11274: CNF Rewrite causes a regress in join node performance
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18458 ) Change subject: IMPALA-11274: CNF Rewrite causes a regress in join node performance .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/18458/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: http://gerrit.cloudera.org:8080/#/c/18458/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@605 PS7, Line 605: public void TestFeasibleToConvertToCNF() { > TestFeasibleToConvertToCNF passes. I'd appreciate thoughts on examples for I added 2 tests to convert-to-cnf.test. The first one which uses UPPER function is expected to not be converted to CNF and it does not .. so this is the expected behavior. However, the second query uses CAST function which I would expect to be converted to CNF. But it does not. # IMPALA-11274: Test with string functions in the disjunctive predicate. # In this case the predicate is not converted to CNF select count(*) from lineitem, orders where l_orderkey = o_orderkey and ((upper(l_returnflag) = 'Y' and upper(o_orderpriority) = 'HIGH') or (upper(l_returnflag) = 'N' and upper(o_orderpriority) = 'LOW')) and l_partkey > 0; # IMPALA-11274: Functions like CAST should still be eligible for CNF select count(*) from lineitem, orders where l_orderkey = o_orderkey and ((cast(l_returnflag as varchar(2)) = 'Y' and cast(o_orderpriority as varchar(5)) = 'HIGH') or (cast(l_returnflag as varchar(2)) = 'N' and cast(o_orderpriority as varchar(5)) = 'LOW')) and l_partkey > 0; -- To view, visit http://gerrit.cloudera.org:8080/18458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326406c6b004fe31ec0e2a2f390a3845b8925aa9 Gerrit-Change-Number: 18458 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 14 May 2022 02:13:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11274: CNF Rewrite causes a regress in join node performance
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18458 ) Change subject: IMPALA-11274: CNF Rewrite causes a regress in join node performance .. Patch Set 7: (3 comments) Changes look good overall. A few comments below. http://gerrit.cloudera.org:8080/#/c/18458/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18458/7//COMMIT_MSG@26 PS7, Line 26: norm nit: normal http://gerrit.cloudera.org:8080/#/c/18458/7/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java File fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java: http://gerrit.cloudera.org:8080/#/c/18458/7/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java@102 PS7, Line 102: LOG.info("It is not feasible to rewrite predidcate " + pred.toSql() + " to CNF."); nit: at info level this may create lots of log entries since the rules are applied iteratively whenever any other transformation occurs. Would be better to log at debug level. http://gerrit.cloudera.org:8080/#/c/18458/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: http://gerrit.cloudera.org:8080/#/c/18458/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@605 PS7, Line 605: public void TestFeasibleToConvertToCNF() { These tests don't have OR predicates. One option is to add here but we should also add an actual PlannerTest with an Explain plan that has a string function that previously would have gotten converted through CNF and now is prohibited. I can help create one test case if you don't get around to it. It could be added in testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test (it used tpc-h schema). -- To view, visit http://gerrit.cloudera.org:8080/18458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326406c6b004fe31ec0e2a2f390a3845b8925aa9 Gerrit-Change-Number: 18458 Gerrit-PatchSet: 7 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 29 Apr 2022 19:26:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11247: Test script changes for materialized views
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18421 ) Change subject: IMPALA-11247: Test script changes for materialized views .. IMPALA-11247: Test script changes for materialized views IMPALA-10723 added support for treating materialized views as tables. In certain test configurations, the rebuild of the materialized views (which is done via Hive) was not populating the data in the MV. In this patch, I have changed the source tables of materialized views to be full-acid instead of insert-only transactional tables. This enables the tests to succeed. Insert-only source tables are also meant to work for the MV rebuild but that is a Hive issue that will be investigated separately. Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 Reviewed-on: http://gerrit.cloudera.org:8080/18421 Reviewed-by: Aman Sinha Tested-by: Aman Sinha --- M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv 4 files changed, 12 insertions(+), 58 deletions(-) Approvals: Aman Sinha: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/18421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 Gerrit-Change-Number: 18421 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11247: Test script changes for materialized views
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18421 ) Change subject: IMPALA-11247: Test script changes for materialized views .. Patch Set 7: Verified+1 Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 Gerrit-Change-Number: 18421 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sun, 17 Apr 2022 22:36:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11247: Test script changes for materialized views
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18421 ) Change subject: IMPALA-11247: Test script changes for materialized views .. Patch Set 6: Verified+1 Code-Review+2 Carry forward Quanlong's +2. Marking Verified since the one test failure is a known (flaky ?) issue IMPALA-11160. -- To view, visit http://gerrit.cloudera.org:8080/18421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 Gerrit-Change-Number: 18421 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sun, 17 Apr 2022 21:56:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11247: Test script changes for materialized views
Hello Quanlong Huang, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18421 to look at the new patch set (#6). Change subject: IMPALA-11247: Test script changes for materialized views .. IMPALA-11247: Test script changes for materialized views IMPALA-10723 added support for treating materialized views as tables. In certain test configurations, the rebuild of the materialized views (which is done via Hive) was not populating the data in the MV. In this patch, I have changed the source tables of materialized views to be full-acid instead of insert-only transactional tables. This enables the tests to succeed. Insert-only source tables are also meant to work for the MV rebuild but that is a Hive issue that will be investigated separately. Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 --- M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv 4 files changed, 12 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/18421/6 -- To view, visit http://gerrit.cloudera.org:8080/18421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 Gerrit-Change-Number: 18421 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11247: Test script changes for materialized views
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18421 ) Change subject: IMPALA-11247: Test script changes for materialized views .. Patch Set 5: > Patch Set 5: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8052/ 1 test failure in query_test.test_acid.TestAcid.test_acid_compute_stats which is a flaky test previously reported in https://issues.apache.org/jira/browse/IMPALA-11160. One small change still needs to be done: the compute-table-stats.sh still has alltypes_transactional and jointbl_transactional which were removed in the last patch set. I will update the compute-table-stats.sh -- To view, visit http://gerrit.cloudera.org:8080/18421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 Gerrit-Change-Number: 18421 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sun, 17 Apr 2022 21:09:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11247: Test script changes for materialized views
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/18421 ) Change subject: IMPALA-11247: Test script changes for materialized views .. Patch Set 4: > Patch Set 3: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8051/ 1 test failed in this because I had missed updating the AuthorizationStmtTest.java after removing the alltypes_transactional source table. This is fixed in patchset 4. -- To view, visit http://gerrit.cloudera.org:8080/18421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I349faa0ad36ec8ca6f574f7f92d9a32fb7d0d344 Gerrit-Change-Number: 18421 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sun, 17 Apr 2022 14:51:56 + Gerrit-HasComments: No