(spark) branch master updated: [SPARK-54211][PYTHON][FOLLOW-UP] Fix doctests of mapInArrow

yangjie01 Mon, 10 Nov 2025 02:26:21 -0800

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new fc49dbd868e0 [SPARK-54211][PYTHON][FOLLOW-UP] Fix doctests of 
mapInArrow
fc49dbd868e0 is described below

commit fc49dbd868e08e8607fc188b326b0d8d31294781
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Mon Nov 10 18:26:05 2025 +0800

    [SPARK-54211][PYTHON][FOLLOW-UP] Fix doctests of mapInArrow
    
    ### What changes were proposed in this pull request?
    Fix doctests of mapInArrow
    
    ### Why are the changes needed?
    to make CI happy
    
    ```
    batch.filter(pa.compute.field("id") == 1)
    ```
    the expression input `pa.compute.field("id") == 1` is supported since 
pyarrow 17.0
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    PR builder with
    ```
    default: '{"PYSPARK_IMAGE_TO_TEST": "python-minimum", "PYTHON_TO_TEST": 
"python3.10"}'
    ```
    
    see 
https://github.com/zhengruifeng/spark/actions/runs/19222092639/job/54941916951
    
    ### Was this patch authored or co-authored using generative AI tooling?
    NO
    
    Closes #52965 from zhengruifeng/fix_map_in_arrow_doctest.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: yangjie01 <[email protected]>
---
 python/pyspark/sql/classic/dataframe.py |  6 ++++++
 python/pyspark/sql/connect/dataframe.py | 12 ++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/classic/dataframe.py 
b/python/pyspark/sql/classic/dataframe.py
index 238bd1677b46..ace0ac7c94ce 100644
--- a/python/pyspark/sql/classic/dataframe.py
+++ b/python/pyspark/sql/classic/dataframe.py
@@ -1976,6 +1976,12 @@ def _test() -> None:
     if not have_pyarrow:
         del pyspark.sql.dataframe.DataFrame.toArrow.__doc__
         del pyspark.sql.dataframe.DataFrame.mapInArrow.__doc__
+    else:
+        import pyarrow as pa
+        from pyspark.loose_version import LooseVersion
+
+        if LooseVersion(pa.__version__) < LooseVersion("17.0.0"):
+            del pyspark.sql.dataframe.DataFrame.mapInArrow.__doc__
 
     spark = (
         SparkSession.builder.master("local[4]").appName("sql.classic.dataframe 
tests").getOrCreate()
diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 862974f11165..dfca13a8464b 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -2363,12 +2363,20 @@ def _test() -> None:
         del pyspark.sql.dataframe.DataFrame.rdd.__doc__
 
     if not have_pandas or not have_pyarrow:
-        del pyspark.sql.dataframe.DataFrame.toArrow.__doc__
         del pyspark.sql.dataframe.DataFrame.toPandas.__doc__
-        del pyspark.sql.dataframe.DataFrame.mapInArrow.__doc__
         del pyspark.sql.dataframe.DataFrame.mapInPandas.__doc__
         del pyspark.sql.dataframe.DataFrame.pandas_api.__doc__
 
+    if not have_pyarrow:
+        del pyspark.sql.dataframe.DataFrame.toArrow.__doc__
+        del pyspark.sql.dataframe.DataFrame.mapInArrow.__doc__
+    else:
+        import pyarrow as pa
+        from pyspark.loose_version import LooseVersion
+
+        if LooseVersion(pa.__version__) < LooseVersion("17.0.0"):
+            del pyspark.sql.dataframe.DataFrame.mapInArrow.__doc__
+
     globs["spark"] = (
         PySparkSession.builder.appName("sql.connect.dataframe tests")
         .remote(os.environ.get("SPARK_CONNECT_TESTING_REMOTE", "local[4]"))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-54211][PYTHON][FOLLOW-UP] Fix doctests of mapInArrow

Reply via email to