zml1206 commented on code in PR #44145:
URL: https://github.com/apache/spark/pull/44145#discussion_r1421267218


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala:
##########
@@ -68,10 +72,55 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with 
PredicateHelper {
     case _ => false
   }
 
+  /**
+   * Whether support inferring WindowGroupLimit from Limit outside of Window. 
Check if:
+   * 1. The window orderSpec exists unfoldable one or all window expressions 
should use the same
+   *  expanding window.
+   * 2. All window expressions should not have SizeBasedWindowFunction.
+   * 3. The Limit could not be pushed down through Window.
+   */
+  private def limitSupport(limit: Int, window: Window): Boolean =
+    limit <= conf.windowGroupLimitThreshold && window.child.maxRows.forall(_ > 
limit) &&
+      !window.child.isInstanceOf[WindowGroupLimit] &&
+      (window.orderSpec.exists(!_.child.foldable) ||
+        window.windowExpressions.forall(isExpandingWindow)) &&
+      window.windowExpressions.forall {
+        case Alias(WindowExpression(windowFunction, WindowSpecDefinition(_, _,
+        SpecifiedWindowFrame(_, UnboundedPreceding, CurrentRow))), _)
+          if !windowFunction.isInstanceOf[SizeBasedWindowFunction] &&

Review Comment:
   Limit outside of window can be early pruned by `WindowGroupLimit`, the 
following three conditions must be met:
   1.The window orderSpec exists unfoldable one or all window expressions are 
`RowFrame`. Because when orderSpec is foldable and window expressions is 
`RangeFrame`, aggregation calculation requires the use of all rows in the 
window group.
   2.All window expressions should not have `SizeBasedWindowFunction`. Because 
aggregation calculation of `SizeBasedWindowFunction` same requires the use of 
all rows in the window group.
   3.The Limit could not be pushed down through Window. Because 
`LimitPushDownThroughWindow` have better performance than `WindowGroupLimit`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to