[ https://issues.apache.org/jira/browse/SPARK-22806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Zsolt Piros updated SPARK-22806: --------------------------------------- Description: I got different results for aggregate functions (even for sum and count) when the partition is ordered "Window.partitionBy(column).orderBy(column))" and when it is not ordered 'Window.partitionBy(column)". Example: {code:scala} test("count, sum, stddev_pop functions over window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition")), sum("value").over(Window.partitionBy("partition")), stddev_pop("value").over(Window.partitionBy("partition")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) } test("count, sum, stddev_pop functions over ordered by window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition").orderBy("key")), sum("value").over(Window.partitionBy("partition").orderBy("key")), stddev_pop("value").over(Window.partitionBy("partition").orderBy("key")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) } {code} The "count, sum, stddev_pop functions over ordered by window" fails with the error: == Results == !== Correct Answer - 2 == == Spark Answer - 2 == !struct<> struct<key:string,count(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double> ![a,2,300.0,50.0] [a,1,100.0,0.0] [b,2,300.0,50.0] [b,2,300.0,50.0] was: I got different results for aggregate functions (even for sum and count) when the partition is ordered "Window.partitionBy(column).orderBy(column))" and when it is not ordered 'Window.partitionBy(column)". Example: test("count, sum, stddev_pop functions over window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition")), sum("value").over(Window.partitionBy("partition")), stddev_pop("value").over(Window.partitionBy("partition")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) } test("count, sum, stddev_pop functions over ordered by window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition").orderBy("key")), sum("value").over(Window.partitionBy("partition").orderBy("key")), stddev_pop("value").over(Window.partitionBy("partition").orderBy("key")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) } The "count, sum, stddev_pop functions over ordered by window" fails with the error: == Results == !== Correct Answer - 2 == == Spark Answer - 2 == !struct<> struct<key:string,count(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double> ![a,2,300.0,50.0] [a,1,100.0,0.0] [b,2,300.0,50.0] [b,2,300.0,50.0] > Window Aggregate functions: unexpected result at ordered partition > ------------------------------------------------------------------ > > Key: SPARK-22806 > URL: https://issues.apache.org/jira/browse/SPARK-22806 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Attila Zsolt Piros > > I got different results for aggregate functions (even for sum and count) when > the partition is ordered "Window.partitionBy(column).orderBy(column))" and > when it is not ordered 'Window.partitionBy(column)". > Example: > {code:scala} > test("count, sum, stddev_pop functions over window") { > val df = Seq( > ("a", 1, 100.0), > ("b", 1, 200.0)).toDF("key", "partition", "value") > df.createOrReplaceTempView("window_table") > checkAnswer( > df.select( > $"key", > count("value").over(Window.partitionBy("partition")), > sum("value").over(Window.partitionBy("partition")), > stddev_pop("value").over(Window.partitionBy("partition")) > ), > Seq( > Row("a", 2, 300.0, 50.0), > Row("b", 2, 300.0, 50.0))) > } > test("count, sum, stddev_pop functions over ordered by window") { > val df = Seq( > ("a", 1, 100.0), > ("b", 1, 200.0)).toDF("key", "partition", "value") > df.createOrReplaceTempView("window_table") > checkAnswer( > df.select( > $"key", > count("value").over(Window.partitionBy("partition").orderBy("key")), > sum("value").over(Window.partitionBy("partition").orderBy("key")), > > stddev_pop("value").over(Window.partitionBy("partition").orderBy("key")) > ), > Seq( > Row("a", 2, 300.0, 50.0), > Row("b", 2, 300.0, 50.0))) > } > {code} > The "count, sum, stddev_pop functions over ordered by window" fails with the > error: > == Results == > !== Correct Answer - 2 == == Spark Answer - 2 == > !struct<> struct<key:string,count(value) OVER (PARTITION BY > partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) > OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST > unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition > ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double> > ![a,2,300.0,50.0] [a,1,100.0,0.0] > [b,2,300.0,50.0] [b,2,300.0,50.0] > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org