[ https://issues.apache.org/jira/browse/HIVE-26006?focusedWorklogId=769034&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769034 ]
ASF GitHub Bot logged work on HIVE-26006: ----------------------------------------- Author: ASF GitHub Bot Created on: 11/May/22 12:20 Start Date: 11/May/22 12:20 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3082: URL: https://github.com/apache/hive/pull/3082#discussion_r870221178 ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java: ########## @@ -244,18 +223,41 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator topNKey) throws Semant reduceSinkDesc.getColumnExprMap(), reduceSinkDesc.getOrder(), reduceSinkDesc.getNullOrder()); + + pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator); + } + + private <T extends AbstractOperatorDesc> void pushDownThrough( + CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, Operator<T> operator) + throws SemanticException { + + pushDownThrough(commonKeyPrefix, topNKey, operator, operator); + } + + private <TDesc extends AbstractOperatorDesc, TParentDesc extends AbstractOperatorDesc> void pushDownThrough( + CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, + Operator<TDesc> operator, Operator<TParentDesc> parentOperator) Review Comment: Are there any advantages in using type parameters in this method (i.e., `TDesc`, `TParentDesc`) instead of wildcards? It seems that the conf of the operator is not used anywhere in this method so we could possibly simplify the signature to: `private void pushDownThrough(CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, Operator<?> operator, Operator<?> parentOperator)` ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java: ########## @@ -244,18 +223,41 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator topNKey) throws Semant reduceSinkDesc.getColumnExprMap(), reduceSinkDesc.getOrder(), reduceSinkDesc.getNullOrder()); + + pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator); + } + + private <T extends AbstractOperatorDesc> void pushDownThrough( + CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, Operator<T> operator) + throws SemanticException { + + pushDownThrough(commonKeyPrefix, topNKey, operator, operator); + } Review Comment: Wildcard instead of `T extends AbstractOperatorDesc`? See previous comments. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java: ########## @@ -34,6 +34,7 @@ import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx; import org.apache.hadoop.hive.ql.lib.SemanticNodeProcessor; import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.plan.AbstractOperatorDesc; Review Comment: Possibly can be removed. ########## ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java: ########## @@ -252,7 +252,8 @@ public TopNKeyDescExplainVectorization getTopNKeyVectorization() { public TopNKeyDesc combine(CommonKeyPrefix commonKeyPrefix) { return new TopNKeyDesc(topN, commonKeyPrefix.getMappedOrder(), commonKeyPrefix.getMappedNullOrder(), commonKeyPrefix.getMappedColumns(), - commonKeyPrefix.getMappedColumns().subList(0, partitionKeyColumns.size()), + commonKeyPrefix.getMappedColumns() + .subList(0, Math.min(partitionKeyColumns.size(), commonKeyPrefix.getMappedColumns().size())), Review Comment: This new change guarantees that the following condition `keys >= partitionKeys` is always true no matter the input (since the partition keys are a sublist of keys). Maybe, I am missing something but I cannot see how we can have `keys < partitionKeys` at this stage. If what I am saying is correct then the inequality in the new bail out condition is misleading. More generally does it make sense to create a new TNK operator when `partitionKeyColumns.size() >= commonKeyPrefix.size()`? If not then maybe we should add appropriate assertion instead of using `Math.min`. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java: ########## @@ -244,13 +223,35 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator topNKey) throws Semant reduceSinkDesc.getColumnExprMap(), reduceSinkDesc.getOrder(), reduceSinkDesc.getNullOrder()); + + pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator); + } + + private <T extends AbstractOperatorDesc> void pushDownThrough( + CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, Operator<T> operator) + throws SemanticException { + + pushDownThrough(commonKeyPrefix, topNKey, operator, operator); + } + + private <TDesc extends AbstractOperatorDesc, TParentDesc extends AbstractOperatorDesc> void pushDownThrough( + CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, + Operator<TDesc> join, Operator<TParentDesc> reduceSinkOperator) + throws SemanticException { + + final TopNKeyDesc topNKeyDesc = topNKey.getConf(); if (commonKeyPrefix.isEmpty() || commonKeyPrefix.size() == topNKeyDesc.getPartitionKeyColumns().size()) { return; } + final TopNKeyDesc newTopNKeyDesc = topNKeyDesc.combine(commonKeyPrefix); + if (newTopNKeyDesc.getKeyColumns().size() > 0 && + newTopNKeyDesc.getKeyColumns().size() <= newTopNKeyDesc.getPartitionKeyColumns().size()) { Review Comment: I was thinking that we could exploit `commonKeyPrefix` and `topNKeyDesc` to bail out before calling `combine` and creating the new TNK. Possibly the logic could be merged with the existing bail out condition just above. Moreover, if I got it right it only makes sense to introduce a new TNK operator when `keys > partitionKeys`. Issue Time Tracking ------------------- Worklog Id: (was: 769034) Time Spent: 1h 10m (was: 1h) > TopNKey and PTF with more than one column is failing with IOBE > -------------------------------------------------------------- > > Key: HIVE-26006 > URL: https://issues.apache.org/jira/browse/HIVE-26006 > Project: Hive > Issue Type: Bug > Reporter: Naresh P R > Assignee: Krisztian Kasa > Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:java} > java.lang.IndexOutOfBoundsException: toIndex = 2 > at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014) > at java.util.ArrayList.subList(ArrayList.java:1006) > at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173) > at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215){code} -- This message was sent by Atlassian Jira (v8.20.7#820007)