Saif Addin Ellafi created SPARK-12880: -----------------------------------------
Summary: Different results on groupBy after window function Key: SPARK-12880 URL: https://issues.apache.org/jira/browse/SPARK-12880 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Saif Addin Ellafi Priority: Critical scala> val overVint = Window.partitionBy("product", "bnd", "age").orderBy(asc("yyyymm")) scala> val df_data2 = df_data.withColumn("result", lag("baleom", 1).over(overVint)) scala> df_data2.filter("product = 'MAIN' and bnd = 'High' and yyyymm = 200509").groupBy("yyyymm", "closed", "ever_closed").agg(sum("result").as("result")).show +------+------+-----------+--------------------+ |yyyymm|closed|ever_closed| result| +------+------+-----------+--------------------+ |200509| 1| 1|1.2672666129980398E7| |200509| 0| 0|2.7104834668856387E9| |200509| 0| 1| 1.151339011298214E8| +------+------+-----------+--------------------+ scala> df_data2.filter("product = 'MAIN' and bnd = 'High' and yyyymm = 200509").groupBy("yyyymm", "closed", "ever_closed").agg(sum("result").as("result")).show +------+------+-----------+--------------------+ |yyyymm|closed|ever_closed| result| +------+------+-----------+--------------------+ |200509| 1| 1|1.2357681589980595E7| |200509| 0| 0| 2.709930867575646E9| |200509| 0| 1|1.1595048973981345E8| +------+------+-----------+--------------------+ Does NOT happen with columns not of the window function Happens both in cluster mode and local mode Before group by operation, data looks good and is consistent -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org