[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26566. ---------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23657 [https://github.com/apache/spark/pull/23657] > Upgrade apache/arrow to 0.12.0 > ------------------------------ > > Key: SPARK-26566 > URL: https://issues.apache.org/jira/browse/SPARK-26566 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Bryan Cutler > Assignee: Bryan Cutler > Priority: Major > Fix For: 3.0.0 > > > Version 0.12.0 includes the following selected fixes/improvements relevant to > Spark users: > * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 > * Java, Reduce heap usage for variable width vectors, ARROW-4147 > * Binary identity cast not implemented, ARROW-4101 > * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 > * conversion to date object no longer needed, ARROW-3910 > * Error reading IPC file with no record batches, ARROW-3894 > * Signed to unsigned integer cast yields incorrect results when type sizes > are the same, ARROW-3790 > * from_pandas gives incorrect results when converting floating point to bool, > ARROW-3428 > * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / > libboost issue), ARROW-3048 > * Java update to official Flatbuffers version 1.9.0, ARROW-3175 > complete list > [here|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0] > PySpark requires the following fixes to work with PyArrow 0.12.0 > * Encrypted pyspark worker fails due to ChunkedStream missing closed property > * pyarrow now converts dates as objects by default, which causes error > because type is assumed datetime64 > * ArrowTests fails due to difference in raised error message > * pyarrow.open_stream deprecated > * tests fail because groupby adds index column with duplicate name > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org