[ https://issues.apache.org/jira/browse/SPARK-22806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293716#comment-16293716 ]
Marco Gaido commented on SPARK-22806: ------------------------------------- This is the right behavior. Also Postgres works like this. if you specify the order by clause, by default the range is UNBOUNDED PRECEDING - CURRENT ROW. > Window Aggregate functions: unexpected result at ordered partition > ------------------------------------------------------------------ > > Key: SPARK-22806 > URL: https://issues.apache.org/jira/browse/SPARK-22806 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Attila Zsolt Piros > Attachments: WindowFunctionsWithGroupByError.scala > > > I got different results for aggregate functions (even for sum and count) when > the partition is ordered "Window.partitionBy(column).orderBy(column))" and > when it is not ordered 'Window.partitionBy(column)". > Example: > {code:java} > test("count, sum, stddev_pop functions over window") { > val df = Seq( > ("a", 1, 100.0), > ("b", 1, 200.0)).toDF("key", "partition", "value") > df.createOrReplaceTempView("window_table") > checkAnswer( > df.select( > $"key", > count("value").over(Window.partitionBy("partition")), > sum("value").over(Window.partitionBy("partition")), > stddev_pop("value").over(Window.partitionBy("partition")) > ), > Seq( > Row("a", 2, 300.0, 50.0), > Row("b", 2, 300.0, 50.0))) > } > test("count, sum, stddev_pop functions over ordered by window") { > val df = Seq( > ("a", 1, 100.0), > ("b", 1, 200.0)).toDF("key", "partition", "value") > df.createOrReplaceTempView("window_table") > checkAnswer( > df.select( > $"key", > count("value").over(Window.partitionBy("partition").orderBy("key")), > sum("value").over(Window.partitionBy("partition").orderBy("key")), > > stddev_pop("value").over(Window.partitionBy("partition").orderBy("key")) > ), > Seq( > Row("a", 2, 300.0, 50.0), > Row("b", 2, 300.0, 50.0))) > } > {code} > The "count, sum, stddev_pop functions over ordered by window" fails with the > error: > {noformat} > == Results == > !== Correct Answer - 2 == == Spark Answer - 2 == > !struct<> struct<key:string,count(value) OVER (PARTITION BY > partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) > OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST > unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition > ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double> > ![a,2,300.0,50.0] [a,1,100.0,0.0] > [b,2,300.0,50.0] [b,2,300.0,50.0] > {noformat} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org