[ 
https://issues.apache.org/jira/browse/SPARK-11481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998361#comment-14998361
 ] 

Jose Antonio commented on SPARK-11481:
--------------------------------------

Hi Davies, thanks.
Yes your example and others examples I am doing are working fine.
I am afraid that perhaps the issue could be when it is working on a 
parallelized dataframe over a cluster?

I have tested this a lot and the results where concluding.

However, it is working well, at least in your example and mines in local mode. 

I will continue doing some research on this to find a way providing a 
reproducible example.

-Jose

> orderBy with multiple columns in WindowSpec does not work properly
> ------------------------------------------------------------------
>
>                 Key: SPARK-11481
>                 URL: https://issues.apache.org/jira/browse/SPARK-11481
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.5.1
>         Environment: All
>            Reporter: Jose Antonio
>              Labels: DataFrame, sparkSQL
>
> When using multiple columns in the orderBy of a WindowSpec the order by seems 
> to work only for the first column.
> A possible workaround is to sort previosly the DataFrame and then apply the 
> window spec over the sorted DataFrame
> e.g. 
> THIS NOT WORKS:
> window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date', 
> 'mib_id', 'day').rowsBetween(-sys.maxsize, 0)
> df = df.withColumn('user_version', 
> func.sum(df.group_counter).over(window_sum))
> THIS WORKS WELL:
> df = df.sort('user_unique_id', 'creation_date', 'mib_id', 'day')
> window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date', 
> 'mib_id', 'day').rowsBetween(-sys.maxsize, 0)
> df = df.withColumn('user_version', 
> func.sum(df.group_counter).over(window_sum))
> Also, can anybody confirm that this is a true workaround?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to