Re: [PR] HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries [hive]

via GitHub Mon, 05 Jan 2026 00:48:50 -0800


zabetak commented on code in PR #6202:
URL: https://github.com/apache/hive/pull/6202#discussion_r2660711052



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java:
##########
@@ -111,6 +123,50 @@ public Object process(Node nd, Stack<Node> stack, 
NodeProcessorCtx procCtx,
     return null;
   }
 
+  /**
+   * Returns true if the ReduceSink is only under an ORDER BY + LIMIT plan
+   * and has no GroupBy or Join operators in its upstream ancestry.
+   * This is used to disable TopNKey for pure ORDER BY LIMIT queries where
+   * LIMIT pushdown must take precedence.
+   */
+  public static boolean isOrderByLimitPath(ReduceSinkOperator rs) {

Review Comment:
   I have a feeling that the traversal logic here could be skipped by properly 
setting up the `RuleRegExp` in 
`org.apache.hadoop.hive.ql.parse.TezCompiler#runTopNKeyOptimization`.
   
   Something like:
   ```java
   new RuleRegExp("Top n key optimization", "(GBY%|JOIN%).*RS%")
   ```
   The expression above is not tested but I have the impression that with some 
tuning we can get the desired matching scope.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries [hive]

Reply via email to