[ 
https://issues.apache.org/jira/browse/SPARK-55792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071740#comment-18071740
 ] 

Le Xuan Tril commented on SPARK-55792:
--------------------------------------

I've investigated this issue and would like to try to work on it. Could you 
please assign it to me ?

> Optimize DataFrame.diff axis=0 to avoid unpartitioned Window
> ------------------------------------------------------------
>
>                 Key: SPARK-55792
>                 URL: https://issues.apache.org/jira/browse/SPARK-55792
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 4.1.1
>            Reporter: Devin Petersohn
>            Priority: Major
>
> DataFrame.diff(axis=0) currently uses Spark's Window without a partition 
> specification, which will have scaling issues for large datasets. We should 
> try to optimize away the unbounded window (e.g., by using a partitioned 
> window similar to other projects in the space).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to