[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801281#comment-16801281 ]
Hyukjin Kwon commented on SPARK-27276: -------------------------------------- Yes, we really need to upgrade the minimal version soon. There are not so many maintainers of, in particular, PyArrow <> Pandas, and sometimes we need currently to test multiple matrix of Arrow and Pandas. Adding [~ueshin] as well. > Increase the minimum pyarrow version to 0.12.0 > ---------------------------------------------- > > Key: SPARK-27276 > URL: https://issues.apache.org/jira/browse/SPARK-27276 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 3.0.0 > Reporter: Bryan Cutler > Priority: Major > > The current minimum version is 0.8.0, which is pretty ancient since Arrow has > been moving fast and a lot has changed since this version. There are > currently many workarounds checking for different versions or disabling > specific functionality, and the code is getting ugly and difficult to > maintain. Increasing the version will allow cleanup and upgrade the testing > environment. > This involves changing the pyarrow version in setup.py (currently at 0.8.0), > updating Jenkins to test against the new version, code cleanup to remove > workarounds from older versions. Users would then need to ensure this > version is installed on the cluster. > There is also a 0.12.1 release, so I will need to check what bugs that fixed > to see if that will be a better version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org