HyukjinKwon opened a new pull request #30128:
URL: https://github.com/apache/spark/pull/30128


   ### What changes were proposed in this pull request?
   
   This PR proposes to set the upper bound of PyArrow and Pandas versions to 
1.2.0 and 1.0.0 respectively.
   
   
https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895 
and 
https://github.com/apache/spark/commit/07a9885f2792be1353f4a923d649e90bc431cb38 
were not ported back so it fails the tests.
   
   
https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895 
contains Arrow dependency upgrade so it cannot be cleanly ported back.
   
   Note that I _think_ these tests were broken from the very first place at 
https://github.com/apache/spark/commit/7c65f7680ffbe2c03e444ec60358cbf912c27d13#diff-bdcc6a2a85f645f62724fe8dafbf0581cb0c1d65f6a76cb2985a9172e31a473c.
 There was one flaky test in ML that stops other tests so SQL and Arrow related 
tests were not shown.
   
   ### Why are the changes needed?
   
   1. Spark 2.4.x already declared that higher versions might not work at 
https://github.com/apache/spark/blob/branch-2.4/docs/sql-pyspark-pandas-with-arrow.md#recommended-pandas-and-pyarrow-versions.
   
   2. We're currently unable to test all combinations (due to the lack of 
resources in GitHub Actions, see SPARK-32264. It should be best to pick one 
combination to test.
   
   3. Just to clarify, Spark 2.4 works with the latest PyArrow and pandas 99% 
correctly. Most of are just test only issues.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   GitHub Actions in this build should test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to