[ https://issues.apache.org/jira/browse/SPARK-46059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788933#comment-17788933 ]
Dongjoon Hyun commented on SPARK-46059: --------------------------------------- It seems that I found the root case of the Infra docker image. {code} $ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-6955850829 bash WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested root@39f78dbc0836:/# python3.12 Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.12/dist-packages/pandas/__init__.py", line 46, in <module> from pandas.core.api import ( File "/usr/local/lib/python3.12/dist-packages/pandas/core/api.py", line 1, in <module> from pandas._libs import ( File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/__init__.py", line 18, in <module> from pandas._libs.interval import Interval File "interval.pyx", line 1, in init pandas._libs.interval File "hashtable.pyx", line 1, in init pandas._libs.hashtable File "missing.pyx", line 1, in init pandas._libs.missing File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/tslibs/__init__.py", line 39, in <module> from pandas._libs.tslibs.conversion import localize_pydatetime File "conversion.pyx", line 1, in init pandas._libs.tslibs.conversion File "offsets.pyx", line 1, in init pandas._libs.tslibs.offsets File "timestamps.pyx", line 1, in init pandas._libs.tslibs.timestamps File "timedeltas.pyx", line 1, in init pandas._libs.tslibs.timedeltas File "timezones.pyx", line 24, in init pandas._libs.tslibs.timezones File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/__init__.py", line 2, in <module> from .tz import * File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py", line 21, in <module> from six.moves import _thread ModuleNotFoundError: No module named 'six.moves' {code} > Investigate `pandas` import issues in Python 3.12 CI > ---------------------------------------------------- > > Key: SPARK-46059 > URL: https://issues.apache.org/jira/browse/SPARK-46059 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, PySpark > Affects Versions: 4.0.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Major > > This happens in Python 3.12 CI only. > - https://github.com/apache/spark/actions/runs/6959106836/job/18935673389 > {code} > Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: > /__w/spark/spark/python/target/73ed28d0-ae18-426e-9760-d03bea982a9b/python3.12__pyspark.streaming.tests.test_context__l4z6a7a2.log) > Traceback (most recent call last): > File "<frozen runpy>", line 198, in _run_module_as_main > File "<frozen runpy>", line 88, in _run_code > File "/__w/spark/spark/python/pyspark/streaming/tests/test_context.py", > line 23, in <module> > from pyspark.testing.streamingutils import PySparkStreamingTestCase > File "/__w/spark/spark/python/pyspark/testing/__init__.py", line 19, in > <module> > from pyspark.testing.pandasutils import assertPandasOnSparkEqual > File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 58, in > <module> > import pyspark.pandas as ps > File "/__w/spark/spark/python/pyspark/pandas/__init__.py", line 33, in > <module> > require_minimum_pandas_version() > File "/__w/spark/spark/python/pyspark/sql/pandas/utils.py", line 27, in > require_minimum_pandas_version > import pandas > File "/usr/local/lib/python3.12/dist-packages/pandas/__init__.py", line 46, > in <module> > from pandas.core.api import ( > File "/usr/local/lib/python3.12/dist-packages/pandas/core/api.py", line 1, > in <module> > from pandas._libs import ( > File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/__init__.py", > line 18, in <module> > from pandas._libs.interval import Interval > File "interval.pyx", line 1, in init pandas._libs.interval > File "hashtable.pyx", line 1, in init pandas._libs.hashtable > File "missing.pyx", line 42, in init pandas._libs.missing > AttributeError: partially initialized module 'pandas' has no attribute > '_pandas_datetime_CAPI' (most likely due to a circular import) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org