This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 0b7736c1d12 [SPARK-45953][INFRA] Add `Python 3.10` to Infra docker 
image
0b7736c1d12 is described below

commit 0b7736c1d121947e418a356cf0431d9d7e969c90
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Thu Nov 16 13:37:38 2023 -0800

    [SPARK-45953][INFRA] Add `Python 3.10` to Infra docker image
    
    ### What changes were proposed in this pull request?
    
    This PR aims to add `Python 3.10` to Infra docker images.
    
    ### Why are the changes needed?
    
    This is a preparation to add a daily `Python 3.10` GitHub Action job later 
for Apache Spark 4.0.0.
    
    Note that Python 3.10 is installed at the last step to avoid the following 
issues which happens when we install Python 3.9 and 3.10 at the same stage by 
package manager.
    ```
    #21 13.03 ERROR: Cannot uninstall 'blinker'. It is a distutils installed 
project and thus we cannot accurately determine which files belong to it which 
would lead to only a partial uninstall.
    #21 ERROR: process "/bin/sh -c python3.9 -m pip install numpy 
'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 
'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 
'scikit-learn==1.1.*'" did not complete successfully: exit code: 1
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    1. I verified that the Python CI is not affected and still use Python 3.9.5 
only.
    ```
    ========================================================================
    Running PySpark tests
    ========================================================================
    Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
    Will test against the following Python executables: ['python3.9']
    Will test the following Python modules: ['pyspark-errors']
    python3.9 python_implementation is CPython
    python3.9 version is: Python 3.9.5
    Starting test(python3.9): pyspark.errors.tests.test_errors (temp output: 
/__w/spark/spark/python/target/fd967f24-3607-4aa6-8190-3f8d7de522e1/python3.9__pyspark.errors.tests.test_errors___zauwgy1.log)
    Finished test(python3.9): pyspark.errors.tests.test_errors (0s)
    Tests passed in 0 seconds
    ```
    
    2. Pass `Base Image Build` step for new Python 3.10.
    
    ![Screenshot 2023-11-16 at 10 53 37 
AM](https://github.com/apache/spark/assets/9700541/6bbb3461-c5f0-4d60-94f6-7cd8df0594ed)
    
    3. Since new Python 3.10 is not used in CI, we need to validate like the 
following.
    
    ```
    $ docker run -it --rm 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10 
--version
    Python 3.10.13
    ```
    
    ```
    $ docker run -it --rm 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10 -m pip 
freeze
    alembic==1.12.1
    annotated-types==0.6.0
    blinker==1.7.0
    certifi==2019.11.28
    chardet==3.0.4
    charset-normalizer==3.3.2
    click==8.1.7
    cloudpickle==2.2.1
    contourpy==1.2.0
    coverage==7.3.2
    cycler==0.12.1
    databricks-cli==0.18.0
    dbus-python==1.2.16
    deepspeed==0.12.3
    distro-info==0.23+ubuntu1.1
    docker==6.1.3
    entrypoints==0.4
    et-xmlfile==1.1.0
    filelock==3.9.0
    Flask==3.0.0
    fonttools==4.44.3
    gitdb==4.0.11
    GitPython==3.1.40
    googleapis-common-protos==1.56.4
    greenlet==3.0.1
    grpcio==1.56.2
    grpcio-status==1.48.2
    gunicorn==21.2.0
    hjson==3.1.0
    idna==2.8
    importlib-metadata==6.8.0
    itsdangerous==2.1.2
    Jinja2==3.1.2
    joblib==1.3.2
    kiwisolver==1.4.5
    lxml==4.9.3
    Mako==1.3.0
    Markdown==3.5.1
    MarkupSafe==2.1.3
    matplotlib==3.8.1
    memory-profiler==0.60.0
    mlflow==2.8.1
    mpmath==1.3.0
    networkx==3.0
    ninja==1.11.1.1
    numpy==1.26.2
    oauthlib==3.2.2
    openpyxl==3.1.2
    packaging==23.2
    pandas==2.1.3
    Pillow==10.1.0
    plotly==5.18.0
    protobuf==3.20.3
    psutil==5.9.6
    py-cpuinfo==9.0.0
    pyarrow==14.0.1
    pydantic==2.5.1
    pydantic_core==2.14.3
    PyGObject==3.36.0
    PyJWT==2.8.0
    pynvml==11.5.0
    pyparsing==3.1.1
    python-apt==2.0.1+ubuntu0.20.4.1
    python-dateutil==2.8.2
    pytz==2023.3.post1
    PyYAML==6.0.1
    querystring-parser==1.2.4
    requests==2.31.0
    requests-unixsocket==0.2.0
    scikit-learn==1.1.3
    scipy==1.11.3
    six==1.14.0
    smmap==5.0.1
    SQLAlchemy==2.0.23
    sqlparse==0.4.4
    sympy==1.12
    tabulate==0.9.0
    tenacity==8.2.3
    threadpoolctl==3.2.0
    torch==2.0.1+cpu
    torcheval==0.0.7
    torchvision==0.15.2+cpu
    tqdm==4.66.1
    typing_extensions==4.8.0
    tzdata==2023.3
    unattended-upgrades==0.1
    unittest-xml-reporting==3.2.0
    urllib3==2.1.0
    websocket-client==1.6.4
    Werkzeug==3.0.1
    zipp==3.17.0
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #43840 from dongjoon-hyun/SPARK-45953.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 dev/infra/Dockerfile | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 8d12f00a034..0231414eec6 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -95,3 +95,15 @@ RUN python3.9 -m pip install 'torch<=2.0.1' torchvision 
--index-url https://down
 RUN python3.9 -m pip install torcheval
 # Add Deepspeed as a testing dependency for DeepspeedTorchDistributor
 RUN python3.9 -m pip install deepspeed
+
+# Install Python 3.10 at the last stage to avoid breaking Python 3.9
+RUN add-apt-repository ppa:deadsnakes/ppa
+RUN apt-get update && apt-get install -y \
+    python3.10 python3.10-distutils \
+    && rm -rf /var/lib/apt/lists/*
+RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
+RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN python3.10 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
+RUN python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url 
https://download.pytorch.org/whl/cpu
+RUN python3.10 -m pip install torcheval
+RUN python3.10 -m pip install deepspeed


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to