[ 
https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732932#comment-17732932
 ] 

Haejoon Lee commented on SPARK-43291:
-------------------------------------

With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking 
changes have been introduced. So, we have made the decision to postpone 
addressing these breaking changes until the next major release of Spark, 
version 4.0.0 to minimize disruptions for our users and provide a more seamless 
upgrade experience.

The pandas 2.0.0 release includes a significant number of updates, such as API 
removals, changes in API behavior, parameter removals, parameter behavior 
changes, and bug fixes. We have planned the following approach for each item:

- {*}API Removals{*}: Removed APIs will remain deprecated in Spark 3.5.0, 
provide appropriate warnings, and will be removed in Spark 4.0.0.

- {*}API Behavior Changes{*}: APIs with changed behavior will retain the 
behavior in Spark 3.5.0, provide appropriate warnings, and will align the 
behavior with pandas in Spark 4.0.0.

- {*}Parameter Removals{*}: Removed parameters will remain deprecated in Spark 
3.5.0, provide appropriate warnings, and will be removed in Spark 4.0.0.

- {*}Parameter Behavior Changes{*}: Parameters with changed behavior will 
retain the behavior in Spark 3.5.0, provide appropriate warnings, and will 
align the behavior with pandas in Spark 4.0.0.

- {*}Bug Fixes{*}: Bug fixes mainly related to correctness issues will be fixed 
in pandas 3.5.0.

*To recap, all breaking changes related to pandas 2.0.0 will be supported in 
Spark 4.0.0,* *and will remain deprecated with appropriate errors in Spark 
3.5.0.*
 
Will submit a PR that deprecates all APIs and adds warnings very soon.

> Match behavior for DataFrame.cov on string DataFrame
> ----------------------------------------------------
>
>                 Key: SPARK-43291
>                 URL: https://issues.apache.org/jira/browse/SPARK-43291
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Pandas API on Spark
>    Affects Versions: 3.5.0
>            Reporter: Haejoon Lee
>            Priority: Major
>
> Should enable test below:
> {code:java}
> pdf = pd.DataFrame([("1", "2"), ("0", "3"), ("2", "0"), ("1", "1")], 
> columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> self.assert_eq(pdf.cov(), psdf.cov()) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to