[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Cutler updated SPARK-26566: --------------------------------- Description: Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users: * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 * Java, Reduce heap usage for variable width vectors, ARROW-4147 * Binary identity cast not implemented, ARROW-4101 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 * conversion to date object no longer needed, ARROW-3910 * Error reading IPC file with no record batches, ARROW-3894 * Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790 * from_pandas gives incorrect results when converting floating point to bool, ARROW-3428 * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048 complete list [here|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0] PySpark requires the following fixes to work with PyArrow 0.12.0 * Encrypted pyspark worker fails due to ChunkedStream missing closed property * pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64 * ArrowTests fails due to difference in raised error message * pyarrow.open_stream deprecated * tests fail because groupby adds index column with duplicate name was: _This is just a placeholder for now to collect what needs to be fixed when we upgrade next time_ Version 0.12.0 includes the following: * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 * conversion to date object no longer needed, ARROW-3910 > Upgrade apache/arrow to 0.12.0 > ------------------------------ > > Key: SPARK-26566 > URL: https://issues.apache.org/jira/browse/SPARK-26566 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Bryan Cutler > Priority: Major > > Version 0.12.0 includes the following selected fixes/improvements relevant to > Spark users: > * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 > * Java, Reduce heap usage for variable width vectors, ARROW-4147 > * Binary identity cast not implemented, ARROW-4101 > * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 > * conversion to date object no longer needed, ARROW-3910 > * Error reading IPC file with no record batches, ARROW-3894 > * Signed to unsigned integer cast yields incorrect results when type sizes > are the same, ARROW-3790 > * from_pandas gives incorrect results when converting floating point to bool, > ARROW-3428 > * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / > libboost issue), ARROW-3048 > complete list > [here|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0] > PySpark requires the following fixes to work with PyArrow 0.12.0 > * Encrypted pyspark worker fails due to ChunkedStream missing closed property > * pyarrow now converts dates as objects by default, which causes error > because type is assumed datetime64 > * ArrowTests fails due to difference in raised error message > * pyarrow.open_stream deprecated > * tests fail because groupby adds index column with duplicate name > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org