This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 62f90ec6d32f [SPARK-47452][INFRA][FOLLOWUP] Enforce to install `six` 
to `Python 3.10`
62f90ec6d32f is described below

commit 62f90ec6d32f708a90329bb8c741482e18a63e56
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Tue Apr 2 23:14:43 2024 -0700

    [SPARK-47452][INFRA][FOLLOWUP] Enforce to install `six` to `Python 3.10`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to enforce to install `six` to Python 3.10 because `Python 
3.10` is missing `six` and causes `Pandas` detection failures in CIs.
    - https://github.com/apache/spark/actions/runs/8525063765/job/23373974516
       - Note that `pandas` is visible in the installed package list, but it 
fails when PySpark detects it due to the missing `six`.
    
    ```
    $ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.9 -m pip freeze 
| grep six
    six==1.16.0
    $ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.10 -m pip freeze
    | grep six
    $ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.11 -m pip freeze 
| grep six
    six==1.16.0
    $ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.12 -m pip freeze 
| grep six
    six==1.16.0
    ```
    
    - CI failure message example.
      - https://github.com/apache/spark/actions/runs/8525063765/job/23373974096
    ```
    Starting test(python3.10): 
pyspark.ml.tests.connect.test_connect_classification (temp output: 
/__w/spark/spark/python/target/370eb2c4-12f2-411f-96d1-f617f5d59528/python3.10__pyspark.ml.tests.connect.test_connect_classification__v6itdsxy.log)
    Traceback (most recent call last):
      File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File 
"/__w/spark/spark/python/pyspark/ml/tests/connect/test_connect_classification.py",
 line 37, in <module>
        class ClassificationTestsOnConnect(ClassificationTestsMixin, 
unittest.TestCase):
    NameError: name 'ClassificationTestsMixin' is not defined
    ```
    
    ### Why are the changes needed?
    
    Since Python 3.10 is the default Python version of Ubuntu OS, the behavior 
is different.
    ```
    RUN python3.10 -m pip install numpy pyarrow>=15.0.0 six==1.16.0 ...
    ...
    #20 0.766 Requirement already satisfied: six==1.16.0 in 
/usr/lib/python3/dist-packages (1.16.0)
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Check the docker image built by this PR.
    - 
https://github.com/dongjoon-hyun/spark/actions/runs/8533625657/job/23376659246
    
    ```
    $ docker pull --platform amd64 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657
    
    $ docker run -it --rm 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657 python3.10 -m pip 
freeze | grep six
    six==1.16.0
    ```
    
    Run tests on new docker image.
    ```
    $ docker run -it --rm -v $PWD:/spark 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657
    rootb7f5f56892b0:/# cd /spark
    rootb7f5f56892b0:/spark# python/run-tests 
--modules=pyspark-mllib,pyspark-ml,pyspark-ml-connect --parallelism=1 
--python-executables=python3.10
    Running PySpark tests. Output is in /spark/python/unit-tests.log
    Will test against the following Python executables: ['python3.10']
    Will test the following Python modules: ['pyspark-mllib', 'pyspark-ml', 
'pyspark-ml-connect']
    python3.10 python_implementation is CPython
    python3.10 version is: Python 3.10.12
    Starting test(python3.10): 
pyspark.ml.tests.connect.test_connect_classification (temp output: 
/spark/python/target/675eccdc-3c4b-4146-a58b-030302bdc6d7/python3.10__pyspark.ml.tests.connect.test_connect_classification__9habp0rh.log)
    Finished test(python3.10): 
pyspark.ml.tests.connect.test_connect_classification (159s)
    Starting test(python3.10): pyspark.ml.tests.connect.test_connect_evaluation 
(temp output: 
/spark/python/target/fbac93ba-c72d-40e4-acfe-f3ac01b4932a/python3.10__pyspark.ml.tests.connect.test_connect_evaluation__js11z0ux.log)
    Finished test(python3.10): pyspark.ml.tests.connect.test_connect_evaluation 
(36s)
    Starting test(python3.10): pyspark.ml.tests.connect.test_connect_feature 
(temp output: 
/spark/python/target/fdb8828e-4241-4e78-a7d6-b2a4beb3cfc1/python3.10__pyspark.ml.tests.connect.test_connect_feature__et5gr30f.log)
    Finished test(python3.10): pyspark.ml.tests.connect.test_connect_feature 
(30s)
    Starting test(python3.10): pyspark.ml.tests.connect.test_connect_function 
(temp output: 
/spark/python/target/e365e62f-a09b-483d-9101-fe9dfc0801f2/python3.10__pyspark.ml.tests.connect.test_connect_function__5e288azs.log)
    Finished test(python3.10): pyspark.ml.tests.connect.test_connect_function 
(24s)
    Starting test(python3.10): pyspark.ml.tests.connect.test_connect_pipeline 
(temp output: 
/spark/python/target/bdc167be-6d6e-4704-b840-cf5d23c4b21e/python3.10__pyspark.ml.tests.connect.test_connect_pipeline__63blw3o2.log)
    ...
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #45832 from dongjoon-hyun/SPARK-47452-2.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 dev/infra/Dockerfile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 9e88ed794c21..378264b7afa3 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -96,6 +96,7 @@ ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 
protobuf==4.25.1 goog
 # Install Python 3.10 packages
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
 RUN python3.10 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs 
this
+RUN python3.10 -m pip install --ignore-installed 'six==1.16.0'  # Avoid 
`python3-six` installation
 RUN python3.10 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting 
$CONNECT_PIP_PKGS && \
     python3.10 -m pip install torch torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
     python3.10 -m pip install deepspeed torcheval && \


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to