This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push: new 8b9036f [SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas and PyArrow version in GitHub Actions in branch-2.4 8b9036f is described below commit 8b9036fb684d1621452c22115345ddfcda6e07c5 Author: HyukjinKwon <gurwls...@apache.org> AuthorDate: Thu Oct 22 18:17:36 2020 +0900 [SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas and PyArrow version in GitHub Actions in branch-2.4 ### What changes were proposed in this pull request? This PR proposes to set the upper bound of PyArrow and Pandas versions to 0.12.0 and 0.24.0 respectively. https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895 and https://github.com/apache/spark/commit/07a9885f2792be1353f4a923d649e90bc431cb38 were not ported back so it fails the tests. https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895 contains Arrow dependency upgrade so it cannot be cleanly ported back. Note that I _think_ these tests were broken from the very first place at https://github.com/apache/spark/commit/7c65f7680ffbe2c03e444ec60358cbf912c27d13#diff-bdcc6a2a85f645f62724fe8dafbf0581cb0c1d65f6a76cb2985a9172e31a473c. There was one flaky test in ML that stops other tests so SQL and Arrow related tests were not shown. ### Why are the changes needed? 1. Spark 2.4.x already declared that higher versions might not work at https://github.com/apache/spark/blob/branch-2.4/docs/sql-pyspark-pandas-with-arrow.md#recommended-pandas-and-pyarrow-versions. 2. We're currently unable to test all combinations (due to the lack of resources in GitHub Actions, see SPARK-32264). It should be best to pick one combination to test. 3. Just to clarify, Spark 2.4 works with the latest PyArrow and pandas 99% correctly. Most of are just test only issues. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions in this build should test. Closes #30128 from HyukjinKwon/SPARK-33217. Lead-authored-by: HyukjinKwon <gurwls...@apache.org> Co-authored-by: Dongjoon Hyun <dh...@apple.com> Signed-off-by: HyukjinKwon <gurwls...@apache.org> --- .github/workflows/build_and_test.yml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 8f46250..9390248 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -130,16 +130,16 @@ jobs: if: contains(matrix.modules, 'pyspark') # PyArrow is not supported in PyPy yet, see ARROW-2651. run: | - python3.6 -m pip install numpy pyarrow pandas scipy xmlrunner + python3.6 -m pip install numpy 'pyarrow<0.12.0' 'pandas<0.24.0' scipy xmlrunner python3.6 -m pip list - # PyPy does not have xmlrunner - pypy3 -m pip install numpy pandas scipy + # PyPy does not have xmlrunner, and pandas<0.24.0 installation fails in PyPy3, just skipping. + pypy3 -m pip install numpy scipy pypy3 -m pip list - name: Install Python packages (Python 2.7) if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) run: | # Some tests do not pass in PySpark with PyArrow, for example, pyspark.sql.tests.ArrowTests. - python2.7 -m pip install numpy pandas scipy xmlrunner + python2.7 -m pip install numpy 'pandas<0.24.0' scipy xmlrunner python2.7 -m pip list # SparkR - name: Install R 4.0 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org