(spark) branch branch-4.1 updated: [SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip `test_perf_profiler_data_source` if `pyarrow` is absent

dongjoon Fri, 21 Nov 2025 14:55:44 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.1 by this push:
     new 296e6820eddc [SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip 
`test_perf_profiler_data_source` if `pyarrow` is absent
296e6820eddc is described below

commit 296e6820eddcf2adc42a3ca7aa8ebcf387260f08
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Fri Nov 21 14:55:16 2025 -0800

    [SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip 
`test_perf_profiler_data_source` if `pyarrow` is absent
    
    ### What changes were proposed in this pull request?
    
    This PR aims to skip `test_perf_profiler_data_source` if `pyarrow` is 
absent.
    
    ### Why are the changes needed?
    
    To recover the failed `PyPy` CIs.
    - 
https://github.com/apache/spark/actions/workflows/build_python_pypy3.10.yml
      - https://github.com/apache/spark/actions/runs/19574648782
        - 
https://github.com/apache/spark/actions/runs/19574648782/job/56056836234
    
    ```
    ======================================================================
    ERROR: test_perf_profiler_data_source 
(pyspark.sql.tests.test_udf_profiler.UDFProfiler2Tests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/__w/spark/spark/python/pyspark/sql/tests/test_udf_profiler.py", 
line 609, in test_perf_profiler_data_source
        self.spark.read.format("TestDataSource").load().collect()
      File "/__w/spark/spark/python/pyspark/sql/classic/dataframe.py", line 
469, in collect
        sock_info = self._jdf.collectToPython()
      File 
"/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 
1362, in __call__
        return_value = get_return_value(
      File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", 
line 263, in deco
        return f(*a, **kw)
      File 
"/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py", line 327, 
in get_return_value
        raise Py4JJavaError(
    py4j.protocol.Py4JJavaError: An error occurred while calling 
o235.collectToPython.
    : org.apache.spark.SparkException:
    Error from python worker:
      Traceback (most recent call last):
        File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 199, in 
_run_module_as_main
          return _run_code(code, main_globals, None,
        File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 86, in 
_run_code
          exec(code, run_globals)
        File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 
37, in <module>
        File "/usr/local/pypy/pypy3.10/lib/pypy3.10/importlib/__init__.py", 
line 126, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1006, in 
_find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
        File "<builtin>/frozen importlib._bootstrap_external", line 897, in 
exec_module
        File "<frozen importlib._bootstrap>", line 241, in 
_call_with_frames_removed
        File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py",
 line 21, in <module>
          import pyarrow as pa
      ModuleNotFoundError: No module named 'pyarrow'
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #53162 from dongjoon-hyun/SPARK-54153.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 9b0b1ce2d628f18c5dbe85c0de9884960d50f71b)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/pyspark/sql/tests/test_udf_profiler.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/pyspark/sql/tests/test_udf_profiler.py 
b/python/pyspark/sql/tests/test_udf_profiler.py
index 37f4a70fabd2..e6a7bf40b945 100644
--- a/python/pyspark/sql/tests/test_udf_profiler.py
+++ b/python/pyspark/sql/tests/test_udf_profiler.py
@@ -585,6 +585,7 @@ class UDFProfiler2TestsMixin:
         for id in self.profile_results:
             self.assert_udf_profile_present(udf_id=id, 
expected_line_count_prefix=2)
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_perf_profiler_data_source(self):
         class TestDataSourceReader(DataSourceReader):
             def __init__(self, schema):


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.1 updated: [SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip `test_perf_profiler_data_source` if `pyarrow` is absent

Reply via email to