[spark] branch master updated: [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3

gurwls223 Sun, 09 Jul 2023 23:43:42 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 5dbd6ff6aa7 [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3
5dbd6ff6aa7 is described below

commit 5dbd6ff6aa714f0e2e065f41dcb68b7f793caa86
Author: panbingkun <pbk1...@gmail.com>
AuthorDate: Mon Jul 10 15:43:26 2023 +0900

    [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3
    
    ### What changes were proposed in this pull request?
    The pr aims to upgrade `pandas` from 2.0.2 to 2.0.3.
    
    ### Why are the changes needed?
    1.The new version brings some bug fixed, eg:
    - Bug in DataFrame.convert_dtype() and Series.convert_dtype() when trying 
to convert 
[ArrowDtype](https://pandas.pydata.org/docs/reference/api/pandas.ArrowDtype.html#pandas.ArrowDtype)
 with dtype_backend="nullable_numpy" 
([GH53648](https://github.com/pandas-dev/pandas/issues/53648))
    
    - Bug in 
[read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv)
 when defining dtype with bool[pyarrow] for the "c" and "python" engines 
([GH53390](https://github.com/pandas-dev/pandas/issues/53390))
    
    2.Release notes:
    https://pandas.pydata.org/docs/whatsnew/v2.0.3.html
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Pass GA.
    
    Closes #41812 from panbingkun/SPARK-44267.
    
    Authored-by: panbingkun <pbk1...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 dev/infra/Dockerfile                                  | 4 ++--
 python/pyspark/pandas/supported_api_gen.py            | 2 +-
 python/pyspark/pandas/tests/groupby/test_aggregate.py | 5 +++++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 3b95467389a..af8e1a980f9 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -64,8 +64,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=2.0.2' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.2' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib
+RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos 
grpcio-status
diff --git a/python/pyspark/pandas/supported_api_gen.py 
b/python/pyspark/pandas/supported_api_gen.py
index d259171ecb9..06591c5b26a 100644
--- a/python/pyspark/pandas/supported_api_gen.py
+++ b/python/pyspark/pandas/supported_api_gen.py
@@ -98,7 +98,7 @@ def generate_supported_api(output_rst_file_path: str) -> None:
 
     Write supported APIs documentation.
     """
-    pandas_latest_version = "2.0.2"
+    pandas_latest_version = "2.0.3"
     if LooseVersion(pd.__version__) != LooseVersion(pandas_latest_version):
         msg = (
             "Warning: Latest version of pandas (%s) is required to generate 
the documentation; "
diff --git a/python/pyspark/pandas/tests/groupby/test_aggregate.py 
b/python/pyspark/pandas/tests/groupby/test_aggregate.py
index bb5b165306d..6ceae82caa8 100644
--- a/python/pyspark/pandas/tests/groupby/test_aggregate.py
+++ b/python/pyspark/pandas/tests/groupby/test_aggregate.py
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 import unittest
+from distutils.version import LooseVersion
 
 import pandas as pd
 
@@ -39,6 +40,10 @@ class GroupbyAggregateMixin:
     def psdf(self):
         return ps.from_pandas(self.pdf)
 
+    @unittest.skipIf(
+        LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
+        "TODO(SPARK-44289): Enable GroupbyAggregateTests.test_aggregate for 
pandas 2.0.0.",
+    )
     def test_aggregate(self):
         pdf = pd.DataFrame(
             {"A": [1, 1, 2, 2], "B": [1, 2, 3, 4], "C": [0.362, 0.227, 1.267, 
-0.562]}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3

Reply via email to