[ https://issues.apache.org/jira/browse/SPARK-11481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998361#comment-14998361 ]
Jose Antonio commented on SPARK-11481: -------------------------------------- Hi Davies, thanks. Yes your example and others examples I am doing are working fine. I am afraid that perhaps the issue could be when it is working on a parallelized dataframe over a cluster? I have tested this a lot and the results where concluding. However, it is working well, at least in your example and mines in local mode. I will continue doing some research on this to find a way providing a reproducible example. -Jose > orderBy with multiple columns in WindowSpec does not work properly > ------------------------------------------------------------------ > > Key: SPARK-11481 > URL: https://issues.apache.org/jira/browse/SPARK-11481 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.5.1 > Environment: All > Reporter: Jose Antonio > Labels: DataFrame, sparkSQL > > When using multiple columns in the orderBy of a WindowSpec the order by seems > to work only for the first column. > A possible workaround is to sort previosly the DataFrame and then apply the > window spec over the sorted DataFrame > e.g. > THIS NOT WORKS: > window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date', > 'mib_id', 'day').rowsBetween(-sys.maxsize, 0) > df = df.withColumn('user_version', > func.sum(df.group_counter).over(window_sum)) > THIS WORKS WELL: > df = df.sort('user_unique_id', 'creation_date', 'mib_id', 'day') > window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date', > 'mib_id', 'day').rowsBetween(-sys.maxsize, 0) > df = df.withColumn('user_version', > func.sum(df.group_counter).over(window_sum)) > Also, can anybody confirm that this is a true workaround? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org