Jose Antonio created SPARK-11481: ------------------------------------ Summary: orderBy with multiple columns in WindowSpec does not work properly Key: SPARK-11481 URL: https://issues.apache.org/jira/browse/SPARK-11481 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 1.5.1 Environment: All Reporter: Jose Antonio
When using multiple columns in the orderBy of a WindowSpec the order by seems to work only for the first column. A possible workaround is to sort previosly the DataFrame and then apply the window spec over the sorted DataFrame e.g. THIS NOT WORKS: window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date', 'mib_id', 'day').rowsBetween(-sys.maxsize, 0) df = df.withColumn('user_version', func.sum(df.group_counter).over(window_sum)) THIS WORKS WELL: df = df.sort('user_unique_id', 'creation_date', 'mib_id', 'day') window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date', 'mib_id', 'day').rowsBetween(-sys.maxsize, 0) df = df.withColumn('user_version', func.sum(df.group_counter).over(window_sum)) Also, can anybody confirm that this is a true workaround? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org