(spark) branch branch-3.5 updated: [SPARK-48116][INFRA][3.5] Run `pyspark-pandas*` only in PR builder and Daily Python CIs

dongjoon Wed, 08 May 2024 13:46:12 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new ff691fa611f0 [SPARK-48116][INFRA][3.5] Run `pyspark-pandas*` only in 
PR builder and Daily Python CIs
ff691fa611f0 is described below

commit ff691fa611f0c8a7f0ff626179bced2b48ef9b7d
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Wed May 8 13:45:55 2024 -0700

    [SPARK-48116][INFRA][3.5] Run `pyspark-pandas*` only in PR builder and 
Daily Python CIs
    
    ### What changes were proposed in this pull request?
    
    This PR aims to run `pyspark-pandas*` of `branch-3.5` only in PR builder 
and Daily Python CIs. In other words, only the commit builder will skip it by 
default. Please note that all PR builders is not consuming ASF resources and 
they provides lots of test coverage everyday.
    
    `branch-3.5` Python Daily CI runs all Python tests including 
`pyspark-pandas` like the following.
    
    
https://github.com/apache/spark/blob/21548a8cc5c527d4416a276a852f967b4410bd4b/.github/workflows/build_branch35_python.yml#L43-L44
    
    ### Why are the changes needed?
    
    To reduce GitHub Action usage to meet ASF INFRA policy.
    - https://infra.apache.org/github-actions-policy.html
    
        > All workflows MUST have a job concurrency level less than or equal to 
20. This means a workflow cannot have more than 20 jobs running at the same 
time across all matrices.
    
    Although `pandas` is an **optional** package in PySpark, this is essential 
for PySpark users and we have **6 test pipelines** which requires lots of 
resources. We need to optimize the job concurrently level to `less than or 
equal to 20` while keeping the test capability as much as possible.
    
    
https://github.com/apache/spark/blob/a762f3175fcdb7b069faa0c2bfce93d295cb1f10/dev/requirements.txt#L4-L7
    
    - pyspark-pandas
    - pyspark-pandas-slow
    - pyspark-pandas-connect
    - pyspark-pandas-slow-connect
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Manual review.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #46482 from dongjoon-hyun/SPARK-48116-3.5.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 .github/workflows/build_and_test.yml | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 9c3dc95d0f66..679c51bb0941 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -82,6 +82,11 @@ jobs:
           pyspark=true; sparkr=true; tpcds=true; docker=true;
           pyspark_modules=`cd dev && python -c "import 
sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if 
m.name.startswith('pyspark')))"`
           pyspark=`./dev/is-changed.py -m $pyspark_modules`
+          if [ "${{ github.repository != 'apache/spark' }}" ]; then
+            pandas=$pyspark
+          else
+            pandas=false
+          fi
           sparkr=`./dev/is-changed.py -m sparkr`
           tpcds=`./dev/is-changed.py -m sql`
           docker=`./dev/is-changed.py -m docker-integration-tests`
@@ -90,6 +95,7 @@ jobs:
             {
               \"build\": \"$build\",
               \"pyspark\": \"$pyspark\",
+              \"pyspark-pandas\": \"$pandas\",
               \"sparkr\": \"$sparkr\",
               \"tpcds-1g\": \"$tpcds\",
               \"docker-integration-tests\": \"$docker\",
@@ -361,6 +367,14 @@ jobs:
             pyspark-pandas-connect
           - >-
             pyspark-pandas-slow-connect
+        exclude:
+          # Always run if pyspark-pandas == 'true', even infra-image is skip 
(such as non-master job)
+          # In practice, the build will run in individual PR, but not against 
the individual commit
+          # in Apache Spark repository.
+          - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 
'pyspark-pandas' }}
+          - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 
'pyspark-pandas-slow' }}
+          - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 
'pyspark-pandas-connect' }}
+          - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 
'pyspark-pandas-slow-connect' }}
     env:
       MODULES_TO_TEST: ${{ matrix.modules }}
       HADOOP_PROFILE: ${{ inputs.hadoop }}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-48116][INFRA][3.5] Run `pyspark-pandas*` only in PR builder and Daily Python CIs

Reply via email to