[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Cutler updated SPARK-27276: --------------------------------- Description: The current minimum version is 0.8.0, which is pretty ancient since Arrow has been moving fast and a lot has changed since this version. There are currently many workarounds checking for different versions or disabling specific functionality, and the code is getting ugly and difficult to maintain. Increasing the version will allow cleanup and upgrade the testing environment. This involves changing the pyarrow version in setup.py (currently at 0.8.0), updating Jenkins to test against the new version, code cleanup to remove workarounds from older versions. Newer versions of pyarrow have dropped support for Python 3.4, so it might be necessary to update to Python 3.5+ in Jenkins as well. Users would then need to ensure at least this version of pyarrow is installed on the cluster. There is also a 0.12.1 release, so I will need to check what bugs that fixed to see if that will be a better version. was: The current minimum version is 0.8.0, which is pretty ancient since Arrow has been moving fast and a lot has changed since this version. There are currently many workarounds checking for different versions or disabling specific functionality, and the code is getting ugly and difficult to maintain. Increasing the version will allow cleanup and upgrade the testing environment. This involves changing the pyarrow version in setup.py (currently at 0.8.0), updating Jenkins to test against the new version, code cleanup to remove workarounds from older versions. Users would then need to ensure this version is installed on the cluster. There is also a 0.12.1 release, so I will need to check what bugs that fixed to see if that will be a better version. > Increase the minimum pyarrow version to 0.12.0 > ---------------------------------------------- > > Key: SPARK-27276 > URL: https://issues.apache.org/jira/browse/SPARK-27276 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 3.0.0 > Reporter: Bryan Cutler > Priority: Major > > The current minimum version is 0.8.0, which is pretty ancient since Arrow has > been moving fast and a lot has changed since this version. There are > currently many workarounds checking for different versions or disabling > specific functionality, and the code is getting ugly and difficult to > maintain. Increasing the version will allow cleanup and upgrade the testing > environment. > This involves changing the pyarrow version in setup.py (currently at 0.8.0), > updating Jenkins to test against the new version, code cleanup to remove > workarounds from older versions. Newer versions of pyarrow have dropped > support for Python 3.4, so it might be necessary to update to Python 3.5+ in > Jenkins as well. Users would then need to ensure at least this version of > pyarrow is installed on the cluster. > There is also a 0.12.1 release, so I will need to check what bugs that fixed > to see if that will be a better version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org