zabetak commented on code in PR #4442: URL: https://github.com/apache/hive/pull/4442#discussion_r1288646268
########## ql/src/test/queries/clientpositive/lateral_view.q: ########## @@ -10,6 +10,11 @@ EXPLAIN SELECT myTable.* FROM src LATERAL VIEW explode(array(1,2,3)) myTable AS EXPLAIN SELECT myTable.myCol, myTable2.myCol2 FROM src LATERAL VIEW explode(array(1,2,3)) myTable AS myCol LATERAL VIEW explode(array('a', 'b', 'c')) myTable2 AS myCol2 LIMIT 9; EXPLAIN SELECT myTable2.* FROM src LATERAL VIEW explode(array(array(1,2,3))) myTable AS myCol LATERAL VIEW explode(myTable.myCol) myTable2 AS myCol2 LIMIT 3; +EXPLAIN CBO SELECT * FROM src LATERAL VIEW explode(array(1,2,3)) myTable AS myCol SORT BY key ASC, myCol ASC LIMIT 1; Review Comment: It is a good idea to verify the `EXPLAIN CBO` plan in the presence of lateral views. I am wondering if it would be better to identify a few representative queries and then put the EXPLAIN CBO statements in a new .q file (e.g., `cbo_lateral_views.q`) going from the simpler to the more complicate queries. These would make testing more explicit and it would also reduce the changes in existing .q.out files leading to fewer merge conflicts when backporting. WDYT? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java: ########## @@ -124,9 +125,12 @@ public void onMatch(RelOptRuleCall call) { RexNode newInlineCall = cluster.getRexBuilder().makeCall(tfs.getRowType(), inlineCall.op, newArrayCall); + // Use empty listfor columnMappings. The return row type of the RelNode now comprises of + // all the fields within the UDTF, so there is no mapping from the output fields + // directly to the input fields anymore. final RelNode newTableFunctionScanNode = tfs.copy(tfs.getTraitSet(), tfs.getInputs(), newInlineCall, tfs.getElementType(), tfs.getRowType(), - tfs.getColumnMappings()); + new HashSet<>()); Review Comment: nit: `Collections.emptySet()` is a more efficient choice unless we want to modify the set. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -577,6 +588,77 @@ private QueryBlockInfo convertSource(RelNode r) throws CalciteSemanticException ast = ASTBuilder.subQuery(left, sqAlias); s = new Schema((Union) r, sqAlias); } + } else if (r instanceof HiveTableFunctionScan && + !canOptimizeOutLateralView((HiveTableFunctionScan) r)) { Review Comment: Is this optimization something that we could defer to a follow-up ticket so that we converge and merge this PR faster ? That would also make the diff and git history easier to follow. If we want to push it here then let me share what I was thinking with a concrete SQL example grabbed from existing tests. ```sql SELECT myCol from tmp_pyang_lv LATERAL VIEW explode(array(1,2,3)) myTab as myCol limit 3 ``` The current plan is the following: ``` CBO PLAN: HiveSortLimit(fetch=[3]) HiveProject(mycol=[$5]) HiveTableFunctionScan(invocation=[explode(ARRAY(1, 2, 3))], rowType=[RecordType(VARCHAR(2147483647) inputs, BIGINT BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN ROW__IS__DELETED, INTEGER mytab.mycol)]) HiveTableScan(table=[[default, tmp_pyang_lv]], table:alias=[tmp_pyang_lv]) ``` I was thinking that we could have a CBO optimization that turns the plan above to something like the following: ``` CBO PLAN: HiveSortLimit(fetch=[3]) HiveTableFunctionScan(invocation=[explode(ARRAY(1, 2, 3))], rowType=[RecordType(INTEGER mytab.mycol)]) ``` ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -621,6 +644,109 @@ private ASTNode pkFkHint(int fkTableIndex, boolean nonFkSideIsFiltered) { } } + private ASTNode createASTLateralView(TableFunctionScan tfs, Schema s, + QueryBlockInfo tableFunctionSource, String sqAlias) { + // In the case where the RelNode is a HiveTableFunctionScan, first we check + // to see if we can't optimize out the lateral view operator. We can optimize the + // operator out if only the udtf fields are grabbed out of the RelNode. If any + // of the base table fields need to be grabbed out, then a 'join' needs to be done + // and we need the lateral view. + // Review Comment: The comment seems misplaced here. Maybe it is also a bit redundant given the javadoc in `canOptimizeOutLateralView` method. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java: ########## @@ -114,7 +115,7 @@ public LateralViewPlan(ASTNode lateralView, RelOptCluster cluster, RelNode input this.lateralViewRel = HiveTableFunctionScan.create(cluster, TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), udtfCall, - null, retType, null); + null, retType, createColumnMappings(inputRel)); Review Comment: I was thinking that there is also a third category: 3. Columns from the `inputRel` that are transformed based on the UDTF and appear in the result (something that would enable the derived attribute in `RelColumnMapping`). Anyways I don't fully understand how `RelColumnMapping` is supposed to work so I suppose we can leave with this mapping for the moment. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java: ########## @@ -382,6 +387,11 @@ private static boolean validExchangeChild(HiveSortExchange sortNode) { return sortNode.getInput() instanceof Project; } + private static boolean validTableFunctionScanChild(HiveTableFunctionScan htfsNode) { + return htfsNode.getInputs().size() == 1 && + (htfsNode.getInput(0) instanceof Project || htfsNode.getInput(0) instanceof HiveTableScan); Review Comment: Thanks for the explanation. Not fully understand either but now I see the pattern so the changes look reasonable. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterTableFunctionTransposeRule.java: ########## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.metadata.RelColumnMapping; +import org.apache.calcite.rex.RexCall; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexUtil; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.commons.collections4.CollectionUtils; +import org.apache.hadoop.hive.ql.exec.FunctionRegistry; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +/** + * Rule to transpose Filter and TableFunctionScan RelNodes + * + * This differs from the FilterTableFunctionTransposeRule for a number of reasons: + * - We check if the filter condition is deterministic under Hive rules. We cannot + * push filters that aren't deterministic. + * - We check if the filter only references columns that are not part of the udtf. + * If the input ref in the filter references a column that is generated by the + * udtf, the filter cannot be pushed through. Review Comment: I get the impression that `FilterTableFunctionTransposeRule` also does the same check using column mappings. I would also suggest to isolate this transformation to a separate patch if that is possible. The reasoning is the same less code to review faster to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org