[ https://issues.apache.org/jira/browse/SPARK-36770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-36770: ------------------------------------ Assignee: Apache Spark > Improve run-time performance for window function first and last against > UnboundedFollowing window ROWS frame > ------------------------------------------------------------------------------------------------------------ > > Key: SPARK-36770 > URL: https://issues.apache.org/jira/browse/SPARK-36770 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0, 3.1.2 > Reporter: Guibin Zhang > Assignee: Apache Spark > Priority: Major > > *Context* > The *UnboundedFollowingWindowFunctionFrame* has the time complexity of > *O(N^2)*, N is the number of rows in the current partition, more specific the > complexity is O(N* (N - 1)/2). > What happens internally in UnboundedFollowingWindowFunctionFrame: > In the window frame, while processing each incoming row, it will go through > current row till the end of partition to do re-calculation. This process will > be repeated on each incoming row, which causes the high run-time complexity. > But UnboundedPrecedingWindowFunctionFrame has much better time complexity > O(N), N is the number of rows in the current partition. > > *What is the idea of the improvement?* > Give the big time complexity difference between > UnboundedFollowingWindowFunctionFrame and > UnboundedPrecedingWindowFunctionFrame, we can do following conversions to > improve the time complexity of first() and last() from O(N^2) to O(N) > {code:java} > case 1: > first() OVER(PARTITION BY colA ORDER BY colB ASC ROWS UNBOUNDED FOLLOWING) > converts to > last() OVER(PARTITION BY colA ORDER BY colB DEAC ROWS UNBOUNDED PRECEDING) > case 2: > last() OVER(PARTITION BY colA ORDER BY colB ASC ROWS UNBOUNDED FOLLOWING) > converts to > first() OVER(PARTITION BY colA ORDER BY colB DESC ROWS UNBOUNDED PRECEDING) > {code} > > *Summary* > Replace "UNBOUNDED FOLLOWING" with "UNBOUNDED PRECEDING", and flip the ORDER > BY for the window functions first() and last() for ROWS. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org