[ 
https://issues.apache.org/jira/browse/SPARK-46059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788933#comment-17788933
 ] 

Dongjoon Hyun commented on SPARK-46059:
---------------------------------------

It seems that I found the root case of the Infra docker image.
{code}
$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-6955850829 
bash
WARNING: The requested image's platform (linux/amd64) does not match the 
detected host platform (linux/arm64/v8) and no specific platform was requested
root@39f78dbc0836:/# python3.12
Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/pandas/__init__.py", line 46, 
in <module>
    from pandas.core.api import (
  File "/usr/local/lib/python3.12/dist-packages/pandas/core/api.py", line 1, in 
<module>
    from pandas._libs import (
  File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/__init__.py", line 
18, in <module>
    from pandas._libs.interval import Interval
  File "interval.pyx", line 1, in init pandas._libs.interval
  File "hashtable.pyx", line 1, in init pandas._libs.hashtable
  File "missing.pyx", line 1, in init pandas._libs.missing
  File 
"/usr/local/lib/python3.12/dist-packages/pandas/_libs/tslibs/__init__.py", line 
39, in <module>
    from pandas._libs.tslibs.conversion import localize_pydatetime
  File "conversion.pyx", line 1, in init pandas._libs.tslibs.conversion
  File "offsets.pyx", line 1, in init pandas._libs.tslibs.offsets
  File "timestamps.pyx", line 1, in init pandas._libs.tslibs.timestamps
  File "timedeltas.pyx", line 1, in init pandas._libs.tslibs.timedeltas
  File "timezones.pyx", line 24, in init pandas._libs.tslibs.timezones
  File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/__init__.py", line 
2, in <module>
    from .tz import *
  File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py", line 21, in 
<module>
    from six.moves import _thread
ModuleNotFoundError: No module named 'six.moves'
{code}

> Investigate `pandas` import issues in Python 3.12 CI
> ----------------------------------------------------
>
>                 Key: SPARK-46059
>                 URL: https://issues.apache.org/jira/browse/SPARK-46059
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Project Infra, PySpark
>    Affects Versions: 4.0.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
>
> This happens in Python 3.12 CI only.
> - https://github.com/apache/spark/actions/runs/6959106836/job/18935673389
> {code}
> Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: 
> /__w/spark/spark/python/target/73ed28d0-ae18-426e-9760-d03bea982a9b/python3.12__pyspark.streaming.tests.test_context__l4z6a7a2.log)
> Traceback (most recent call last):
>   File "<frozen runpy>", line 198, in _run_module_as_main
>   File "<frozen runpy>", line 88, in _run_code
>   File "/__w/spark/spark/python/pyspark/streaming/tests/test_context.py", 
> line 23, in <module>
>     from pyspark.testing.streamingutils import PySparkStreamingTestCase
>   File "/__w/spark/spark/python/pyspark/testing/__init__.py", line 19, in 
> <module>
>     from pyspark.testing.pandasutils import assertPandasOnSparkEqual
>   File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 58, in 
> <module>
>     import pyspark.pandas as ps
>   File "/__w/spark/spark/python/pyspark/pandas/__init__.py", line 33, in 
> <module>
>     require_minimum_pandas_version()
>   File "/__w/spark/spark/python/pyspark/sql/pandas/utils.py", line 27, in 
> require_minimum_pandas_version
>     import pandas
>   File "/usr/local/lib/python3.12/dist-packages/pandas/__init__.py", line 46, 
> in <module>
>     from pandas.core.api import (
>   File "/usr/local/lib/python3.12/dist-packages/pandas/core/api.py", line 1, 
> in <module>
>     from pandas._libs import (
>   File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/__init__.py", 
> line 18, in <module>
>     from pandas._libs.interval import Interval
>   File "interval.pyx", line 1, in init pandas._libs.interval
>   File "hashtable.pyx", line 1, in init pandas._libs.hashtable
>   File "missing.pyx", line 42, in init pandas._libs.missing
> AttributeError: partially initialized module 'pandas' has no attribute 
> '_pandas_datetime_CAPI' (most likely due to a circular import)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to