[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

Bryan Cutler (JIRA) Fri, 25 Jan 2019 15:46:21 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bryan Cutler updated SPARK-26566:
---------------------------------
    Description: 
Version 0.12.0 includes the following selected fixes/improvements relevant to 
Spark users:

* Safe cast fails from numpy float64 array with nans to integer, ARROW-4258
* Java, Reduce heap usage for variable width vectors, ARROW-4147
* Binary identity cast not implemented, ARROW-4101
* pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
* conversion to date object no longer needed, ARROW-3910
* Error reading IPC file with no record batches, ARROW-3894
* Signed to unsigned integer cast yields incorrect results when type sizes are 
the same, ARROW-3790
* from_pandas gives incorrect results when converting floating point to bool, 
ARROW-3428
* Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / 
libboost issue), ARROW-3048

complete list 
[here|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0]

PySpark requires the following fixes to work with PyArrow 0.12.0

* Encrypted pyspark worker fails due to ChunkedStream missing closed property
* pyarrow now converts dates as objects by default, which causes error because 
type is assumed datetime64
* ArrowTests fails due to difference in raised error message
* pyarrow.open_stream deprecated
* tests fail because groupby adds index column with duplicate name

 

  was:
_This is just a placeholder for now to collect what needs to be fixed when we 
upgrade next time_

Version 0.12.0 includes the following:
 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
 * conversion to date object no longer needed, ARROW-3910

 


> Upgrade apache/arrow to 0.12.0
> ------------------------------
>
>                 Key: SPARK-26566
>                 URL: https://issues.apache.org/jira/browse/SPARK-26566
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Bryan Cutler
>            Priority: Major
>
> Version 0.12.0 includes the following selected fixes/improvements relevant to 
> Spark users:
> * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258
> * Java, Reduce heap usage for variable width vectors, ARROW-4147
> * Binary identity cast not implemented, ARROW-4101
> * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
> * conversion to date object no longer needed, ARROW-3910
> * Error reading IPC file with no record batches, ARROW-3894
> * Signed to unsigned integer cast yields incorrect results when type sizes 
> are the same, ARROW-3790
> * from_pandas gives incorrect results when converting floating point to bool, 
> ARROW-3428
> * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / 
> libboost issue), ARROW-3048
> complete list 
> [here|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0]
> PySpark requires the following fixes to work with PyArrow 0.12.0
> * Encrypted pyspark worker fails due to ChunkedStream missing closed property
> * pyarrow now converts dates as objects by default, which causes error 
> because type is assumed datetime64
> * ArrowTests fails due to difference in raised error message
> * pyarrow.open_stream deprecated
> * tests fail because groupby adds index column with duplicate name
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

Reply via email to