BTW, this is not intended to be in potential opposition to Apache Spark
Infra 2022 which dongjoon mentioned in "Apache Spark Jenkins Infra 2022".
It is just to share a possible way for the Linux arm64 scheduled job.

Also, I think we should get a final conclusion about the attitude of
self-hosted action from the spark community for future reference.

Regards,
Yikun

Yikun Jiang <yikunk...@gmail.com> 于2022年1月9日周日 11:33写道:

> Hi, all
>
> I tried to verify the possibility of *Linux arm64 scheduled job *using
> self-hosted action, below is some progress and I would like to hear
> suggestion from you in the next step (continue or stop).
>
> Related JIRA: SPARK-35607
> <https://issues.apache.org/jira/browse/SPARK-35607>
>
> *## About self-hosted Github Action:*
> Currently, self-hosted action supported x64(Linux, macOS, Windows),
> ARM64(Linux only), ARM32(Linux only)
> <https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#architectures>
> .
>
> There is guidance on self-hosted runners from Apache Infra
> <https://cwiki.apache.org/confluence/display/INFRA/GitHub+-+self-hosted+runners>.
> The gap to enable self-hosted runner on Apache repo is resource security
> considerations, specifically, it's to prevent the self-hosted runner from
> being accessed by unallow users' PR. As info and suggestion from ASF, the
> apache/airflow team maintained a custom runner
> <https://github.com/ashb/runner/tree/releases/pr-security-options>, and
> it's also used by apache/airflow in their CI. So, we could just use this
> directly.
>
> TLDR, what we needed is setup resource with custom runner, then enable
> these resources in self-hosted action.
>
> *## Test on self-hosted Github Action with custom runner:*
> Here is some tries on my local repo:
> 1. Spark Maven/SBT test:
> PR: https://github.com/apache/spark/pull/35088
> TEST: https://github.com/Yikun/spark/pull/51
> 2. PySpark test:
> PR: https://github.com/apache/spark/pull/35049
> TEST: https://github.com/Yikun/spark/pull/53
> 3. Pull request test on unallow user:
> TEST: https://github.com/Yikun/spark/pull/60
> The self-hosted runner will prevent the PR access the runner due to
> "Running job on worker spark-github-runner-0001 disallowed by security
> policy".
>
> *## Pros of self-hosted github aciton:*
> - Satisfy the simple demands of Linux arm64 sheduled jobs.
> - Reuse the main workflow of github action.
> - All changes are visible on github is easy to review.
> - Easy to migrate when official GA arm64 support ready.
>
> *## What's the next step:*
> * If we can also consider self-hosted action as optional, I will submit a
> JIRA on Apache Infra to request the token to continue, like:
> https://issues.apache.org/jira/browse/INFRA-21305
> * If we certainly think that self-hosted action is not a wise choice, I
> will try to find other way.
>
> There are also some initial discusson, just FYI:
> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/pull/6
>
> Regards,
> Yikun
>

Reply via email to