[
https://issues.apache.org/jira/browse/SPARK-55792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071740#comment-18071740
]
Le Xuan Tril commented on SPARK-55792:
--------------------------------------
I've investigated this issue and would like to try to work on it. Could you
please assign it to me ?
> Optimize DataFrame.diff axis=0 to avoid unpartitioned Window
> ------------------------------------------------------------
>
> Key: SPARK-55792
> URL: https://issues.apache.org/jira/browse/SPARK-55792
> Project: Spark
> Issue Type: Bug
> Components: Pandas API on Spark
> Affects Versions: 4.1.1
> Reporter: Devin Petersohn
> Priority: Major
>
> DataFrame.diff(axis=0) currently uses Spark's Window without a partition
> specification, which will have scaling issues for large datasets. We should
> try to optimize away the unbounded window (e.g., by using a partitioned
> window similar to other projects in the space).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]