[ 
https://issues.apache.org/jira/browse/SPARK-39515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39515:
---------------------------------
    Description: 
There are five problems to address.

*First*, the scheduled jobs are broken as below:

https://github.com/apache/spark/actions/runs/2513261706
https://github.com/apache/spark/actions/runs/2512750310
https://github.com/apache/spark/actions/runs/2509238648
https://github.com/apache/spark/actions/runs/2508246903
https://github.com/apache/spark/actions/runs/2507327914
https://github.com/apache/spark/actions/runs/2506654808
https://github.com/apache/spark/actions/runs/2506143939
https://github.com/apache/spark/actions/runs/2502449498
https://github.com/apache/spark/actions/runs/2501400490
https://github.com/apache/spark/actions/runs/2500407628
https://github.com/apache/spark/actions/runs/2499722093
https://github.com/apache/spark/actions/runs/2499196539
https://github.com/apache/spark/actions/runs/2496544415
https://github.com/apache/spark/actions/runs/2495444227
https://github.com/apache/spark/actions/runs/2493402272
https://github.com/apache/spark/actions/runs/2492759618
https://github.com/apache/spark/actions/runs/2492227816

See also https://github.com/apache/spark/pull/36899 or 
https://github.com/apache/spark/pull/36890
In the master branch, seems like at least Hadoop 2 build is broken currently.

*Second*, it is very difficult to navigate scheduled jobs now. We should use 
https://github.com/apache/spark/actions/workflows/build_and_test.yml?query=event%3Aschedule
 link and manually search one by one.

Since GitHub added the feature to import other workflow, we should leverage 
this feature, see also 
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test_ansi.yml.
 Once we can separate them, it will be defined as a separate workflows. We 
might have to clean up existing build history to make it easier to read (it 
requires committer-only permission to clean them up).

*Third*, we should set the scheduled jobs for branch-3.3, see also 
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L78-L83
 for branch-3.2 job.

*Forth*, we should improve duplicated test skipping logic. See also 
https://github.com/apache/spark/pull/36413#issuecomment-1157205469 and 
https://github.com/apache/spark/pull/36888

*Fifth*, we should probably replace the base image 
(https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302,
 https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain ubunto 
image w/ Docker image cache. See also 
https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md

  was:
There are four problems to address.

*First*, the scheduled jobs are broken as below:

https://github.com/apache/spark/actions/runs/2513261706
https://github.com/apache/spark/actions/runs/2512750310
https://github.com/apache/spark/actions/runs/2509238648
https://github.com/apache/spark/actions/runs/2508246903
https://github.com/apache/spark/actions/runs/2507327914
https://github.com/apache/spark/actions/runs/2506654808
https://github.com/apache/spark/actions/runs/2506143939
https://github.com/apache/spark/actions/runs/2502449498
https://github.com/apache/spark/actions/runs/2501400490
https://github.com/apache/spark/actions/runs/2500407628
https://github.com/apache/spark/actions/runs/2499722093
https://github.com/apache/spark/actions/runs/2499196539
https://github.com/apache/spark/actions/runs/2496544415
https://github.com/apache/spark/actions/runs/2495444227
https://github.com/apache/spark/actions/runs/2493402272
https://github.com/apache/spark/actions/runs/2492759618
https://github.com/apache/spark/actions/runs/2492227816

See also https://github.com/apache/spark/pull/36899 or 
https://github.com/apache/spark/pull/36890
In the master branch, seems like at least Hadoop 2 build is broken currently.

*Second*, it is very difficult to navigate scheduled jobs now. We should use 
https://github.com/apache/spark/actions/workflows/build_and_test.yml?query=event%3Aschedule
 link and manually search one by one.

Since GitHub added the feature to import other workflow, we should leverage 
this feature, see also 
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test_ansi.yml.
 Once we can separate them, it will be defined as a separate workflows. We 
might have to clean up existing build history to make it easier to read (it 
requires committer-only permission to clean them up).

*Third*, we should set the scheduled jobs for branch-3.3, see also 
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L78-L83
 for branch-3.2 job.

*Forth*, we should improve duplicated test skipping logic. See also 
https://github.com/apache/spark/pull/36413#issuecomment-1157205469 and 
https://github.com/apache/spark/pull/36888

*Fifth*, we should probably replace the base image 
(https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302,
 https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain ubunto 
image w/ Docker image cache. See also 
https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md


> Improve/recover scheduled jobs in GitHub Actions
> ------------------------------------------------
>
>                 Key: SPARK-39515
>                 URL: https://issues.apache.org/jira/browse/SPARK-39515
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Build, Project Infra
>    Affects Versions: 3.4.0
>            Reporter: Hyukjin Kwon
>            Priority: Blocker
>
> There are five problems to address.
> *First*, the scheduled jobs are broken as below:
> https://github.com/apache/spark/actions/runs/2513261706
> https://github.com/apache/spark/actions/runs/2512750310
> https://github.com/apache/spark/actions/runs/2509238648
> https://github.com/apache/spark/actions/runs/2508246903
> https://github.com/apache/spark/actions/runs/2507327914
> https://github.com/apache/spark/actions/runs/2506654808
> https://github.com/apache/spark/actions/runs/2506143939
> https://github.com/apache/spark/actions/runs/2502449498
> https://github.com/apache/spark/actions/runs/2501400490
> https://github.com/apache/spark/actions/runs/2500407628
> https://github.com/apache/spark/actions/runs/2499722093
> https://github.com/apache/spark/actions/runs/2499196539
> https://github.com/apache/spark/actions/runs/2496544415
> https://github.com/apache/spark/actions/runs/2495444227
> https://github.com/apache/spark/actions/runs/2493402272
> https://github.com/apache/spark/actions/runs/2492759618
> https://github.com/apache/spark/actions/runs/2492227816
> See also https://github.com/apache/spark/pull/36899 or 
> https://github.com/apache/spark/pull/36890
> In the master branch, seems like at least Hadoop 2 build is broken currently.
> *Second*, it is very difficult to navigate scheduled jobs now. We should use 
> https://github.com/apache/spark/actions/workflows/build_and_test.yml?query=event%3Aschedule
>  link and manually search one by one.
> Since GitHub added the feature to import other workflow, we should leverage 
> this feature, see also 
> https://github.com/apache/spark/blob/master/.github/workflows/build_and_test_ansi.yml.
>  Once we can separate them, it will be defined as a separate workflows. We 
> might have to clean up existing build history to make it easier to read (it 
> requires committer-only permission to clean them up).
> *Third*, we should set the scheduled jobs for branch-3.3, see also 
> https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L78-L83
>  for branch-3.2 job.
> *Forth*, we should improve duplicated test skipping logic. See also 
> https://github.com/apache/spark/pull/36413#issuecomment-1157205469 and 
> https://github.com/apache/spark/pull/36888
> *Fifth*, we should probably replace the base image 
> (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302,
>  https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain 
> ubunto image w/ Docker image cache. See also 
> https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to