[ https://issues.apache.org/jira/browse/SPARK-46751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-46751: ----------------------------------- Labels: pull-request-available (was: ) > Skip test_datasource if PyArrow is not installed > ------------------------------------------------ > > Key: SPARK-46751 > URL: https://issues.apache.org/jira/browse/SPARK-46751 > Project: Spark > Issue Type: Test > Components: PySpark > Affects Versions: 4.0.0 > Reporter: Hyukjin Kwon > Priority: Major > Labels: pull-request-available > > {code} > ====================================================================== > ERROR: test_in_memory_data_source > (pyspark.sql.tests.test_python_datasource.PythonDataSourceTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/sql/tests/test_python_datasource.py", > line 234, in test_in_memory_data_source > self.assertEqual(df.rdd.getNumPartitions(), 3) > File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 224, in rdd > jrdd = self._jdf.javaToPython() > File > "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > return_value = get_return_value( > File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line > 215, in deco > return f(*a, **kw) > File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", > line 326, in get_return_value > raise Py4JJavaError( > py4j.protocol.Py4JJavaError: An error occurred while calling > o208.javaToPython. > : org.apache.spark.SparkException: > Error from python worker: > Traceback (most recent call last): > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line > 61, in require_minimum_pyarrow_version > import pyarrow > ModuleNotFoundError: No module named 'pyarrow' > > The above exception was the direct cause of the following exception: > > Traceback (most recent call last): > File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 197, in > _run_module_as_main > return _run_code(code, main_globals, None, > File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 87, in _run_code > exec(code, run_globals) > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line > 36, in <module> > File "/usr/local/pypy/pypy3.8/lib/pypy3.8/importlib/__init__.py", line > 127, in import_module > return _bootstrap._gcd_import(name[level:], package, level) > File "<frozen importlib._bootstrap>", line 1023, in _gcd_import > File "<frozen importlib._bootstrap>", line 1000, in _find_and_load > File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked > File "<frozen importlib._bootstrap>", line 664, in _load_unlocked > File "<frozen importlib._bootstrap>", line 627, in > _load_backward_compatible > File "<builtin>/frozen zipimport", line 259, in load_module > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py", > line 33, in <module> > from pyspark.sql.connect.conversion import ArrowTableToRowsConversion, > LocalDataToArrowConversion > File "<builtin>/frozen zipimport", line 259, in load_module > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/conversion.py", > line 20, in <module> > check_dependencies(__name__) > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/utils.py", line > 36, in check_dependencies > require_minimum_pyarrow_version() > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line > 68, in require_minimum_pyarrow_version > raise PySparkImportError( > pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] > PyArrow >= 4.0.0 must be installed; however, it was not found. > PYTHONPATH was: > > /__w/spark/spark/python/lib/pyspark.zip:/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip:/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip:/__w/spark/spark/python/: > {code} > https://github.com/apache/spark/actions/runs/7557652490/job/20577472214 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org