This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 0b7736c1d12 [SPARK-45953][INFRA] Add `Python 3.10` to Infra docker image 0b7736c1d12 is described below commit 0b7736c1d121947e418a356cf0431d9d7e969c90 Author: Dongjoon Hyun <dh...@apple.com> AuthorDate: Thu Nov 16 13:37:38 2023 -0800 [SPARK-45953][INFRA] Add `Python 3.10` to Infra docker image ### What changes were proposed in this pull request? This PR aims to add `Python 3.10` to Infra docker images. ### Why are the changes needed? This is a preparation to add a daily `Python 3.10` GitHub Action job later for Apache Spark 4.0.0. Note that Python 3.10 is installed at the last step to avoid the following issues which happens when we install Python 3.9 and 3.10 at the same stage by package manager. ``` #21 13.03 ERROR: Cannot uninstall 'blinker'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall. #21 ERROR: process "/bin/sh -c python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'" did not complete successfully: exit code: 1 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? 1. I verified that the Python CI is not affected and still use Python 3.9.5 only. ``` ======================================================================== Running PySpark tests ======================================================================== Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log Will test against the following Python executables: ['python3.9'] Will test the following Python modules: ['pyspark-errors'] python3.9 python_implementation is CPython python3.9 version is: Python 3.9.5 Starting test(python3.9): pyspark.errors.tests.test_errors (temp output: /__w/spark/spark/python/target/fd967f24-3607-4aa6-8190-3f8d7de522e1/python3.9__pyspark.errors.tests.test_errors___zauwgy1.log) Finished test(python3.9): pyspark.errors.tests.test_errors (0s) Tests passed in 0 seconds ``` 2. Pass `Base Image Build` step for new Python 3.10. ![Screenshot 2023-11-16 at 10 53 37 AM](https://github.com/apache/spark/assets/9700541/6bbb3461-c5f0-4d60-94f6-7cd8df0594ed) 3. Since new Python 3.10 is not used in CI, we need to validate like the following. ``` $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10 --version Python 3.10.13 ``` ``` $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10 -m pip freeze alembic==1.12.1 annotated-types==0.6.0 blinker==1.7.0 certifi==2019.11.28 chardet==3.0.4 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==2.2.1 contourpy==1.2.0 coverage==7.3.2 cycler==0.12.1 databricks-cli==0.18.0 dbus-python==1.2.16 deepspeed==0.12.3 distro-info==0.23+ubuntu1.1 docker==6.1.3 entrypoints==0.4 et-xmlfile==1.1.0 filelock==3.9.0 Flask==3.0.0 fonttools==4.44.3 gitdb==4.0.11 GitPython==3.1.40 googleapis-common-protos==1.56.4 greenlet==3.0.1 grpcio==1.56.2 grpcio-status==1.48.2 gunicorn==21.2.0 hjson==3.1.0 idna==2.8 importlib-metadata==6.8.0 itsdangerous==2.1.2 Jinja2==3.1.2 joblib==1.3.2 kiwisolver==1.4.5 lxml==4.9.3 Mako==1.3.0 Markdown==3.5.1 MarkupSafe==2.1.3 matplotlib==3.8.1 memory-profiler==0.60.0 mlflow==2.8.1 mpmath==1.3.0 networkx==3.0 ninja==1.11.1.1 numpy==1.26.2 oauthlib==3.2.2 openpyxl==3.1.2 packaging==23.2 pandas==2.1.3 Pillow==10.1.0 plotly==5.18.0 protobuf==3.20.3 psutil==5.9.6 py-cpuinfo==9.0.0 pyarrow==14.0.1 pydantic==2.5.1 pydantic_core==2.14.3 PyGObject==3.36.0 PyJWT==2.8.0 pynvml==11.5.0 pyparsing==3.1.1 python-apt==2.0.1+ubuntu0.20.4.1 python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 querystring-parser==1.2.4 requests==2.31.0 requests-unixsocket==0.2.0 scikit-learn==1.1.3 scipy==1.11.3 six==1.14.0 smmap==5.0.1 SQLAlchemy==2.0.23 sqlparse==0.4.4 sympy==1.12 tabulate==0.9.0 tenacity==8.2.3 threadpoolctl==3.2.0 torch==2.0.1+cpu torcheval==0.0.7 torchvision==0.15.2+cpu tqdm==4.66.1 typing_extensions==4.8.0 tzdata==2023.3 unattended-upgrades==0.1 unittest-xml-reporting==3.2.0 urllib3==2.1.0 websocket-client==1.6.4 Werkzeug==3.0.1 zipp==3.17.0 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43840 from dongjoon-hyun/SPARK-45953. Authored-by: Dongjoon Hyun <dh...@apple.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- dev/infra/Dockerfile | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 8d12f00a034..0231414eec6 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -95,3 +95,15 @@ RUN python3.9 -m pip install 'torch<=2.0.1' torchvision --index-url https://down RUN python3.9 -m pip install torcheval # Add Deepspeed as a testing dependency for DeepspeedTorchDistributor RUN python3.9 -m pip install deepspeed + +# Install Python 3.10 at the last stage to avoid breaking Python 3.9 +RUN add-apt-repository ppa:deadsnakes/ppa +RUN apt-get update && apt-get install -y \ + python3.10 python3.10-distutils \ + && rm -rf /var/lib/apt/lists/* +RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 +RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN python3.10 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 'protobuf==3.20.3' 'googleapis-common-protos==1.56.4' +RUN python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url https://download.pytorch.org/whl/cpu +RUN python3.10 -m pip install torcheval +RUN python3.10 -m pip install deepspeed --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org