zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1421267218
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ########## @@ -68,10 +72,55 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with PredicateHelper { case _ => false } + /** + * Whether support inferring WindowGroupLimit from Limit outside of Window. Check if: + * 1. The window orderSpec exists unfoldable one or all window expressions should use the same + * expanding window. + * 2. All window expressions should not have SizeBasedWindowFunction. + * 3. The Limit could not be pushed down through Window. + */ + private def limitSupport(limit: Int, window: Window): Boolean = + limit <= conf.windowGroupLimitThreshold && window.child.maxRows.forall(_ > limit) && + !window.child.isInstanceOf[WindowGroupLimit] && + (window.orderSpec.exists(!_.child.foldable) || + window.windowExpressions.forall(isExpandingWindow)) && + window.windowExpressions.forall { + case Alias(WindowExpression(windowFunction, WindowSpecDefinition(_, _, + SpecifiedWindowFrame(_, UnboundedPreceding, CurrentRow))), _) + if !windowFunction.isInstanceOf[SizeBasedWindowFunction] && Review Comment: Limit outside of window can be early pruned by `WindowGroupLimit`, the following three conditions must be met: 1.The window orderSpec exists unfoldable one or all window expressions are `RowFrame`. Because when orderSpec is foldable and window expressions is `RangeFrame`, aggregation calculation requires the use of all rows in the window group. 2.All window expressions should not have `SizeBasedWindowFunction`. Because aggregation calculation of `SizeBasedWindowFunction` same requires the use of all rows in the window group. 3.The Limit could not be pushed down through Window. Because `LimitPushDownThroughWindow` have better performance than `WindowGroupLimit`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org