Personally I’d love to see us compiling and testing on Linux arm64 as well.

On Sat, Jan 8, 2022 at 7:49 PM Yikun Jiang <yikunk...@gmail.com> wrote:

> BTW, this is not intended to be in potential opposition to Apache Spark
> Infra 2022 which dongjoon mentioned in "Apache Spark Jenkins Infra 2022".
> It is just to share a possible way for the Linux arm64 scheduled job.
>
> Also, I think we should get a final conclusion about the attitude of
> self-hosted action from the spark community for future reference.
>
> Regards,
> Yikun
>
> Yikun Jiang <yikunk...@gmail.com> 于2022年1月9日周日 11:33写道:
>
>> Hi, all
>>
>> I tried to verify the possibility of *Linux arm64 scheduled job *using
>> self-hosted action, below is some progress and I would like to hear
>> suggestion from you in the next step (continue or stop).
>>
>> Related JIRA: SPARK-35607
>> <https://issues.apache.org/jira/browse/SPARK-35607>
>>
>> *## About self-hosted Github Action:*
>> Currently, self-hosted action supported x64(Linux, macOS, Windows),
>> ARM64(Linux only), ARM32(Linux only)
>> <https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#architectures>
>> .
>>
>> There is guidance on self-hosted runners from Apache Infra
>> <https://cwiki.apache.org/confluence/display/INFRA/GitHub+-+self-hosted+runners>.
>> The gap to enable self-hosted runner on Apache repo is resource security
>> considerations, specifically, it's to prevent the self-hosted runner from
>> being accessed by unallow users' PR. As info and suggestion from ASF, the
>> apache/airflow team maintained a custom runner
>> <https://github.com/ashb/runner/tree/releases/pr-security-options>, and
>> it's also used by apache/airflow in their CI. So, we could just use this
>> directly.
>>
>> TLDR, what we needed is setup resource with custom runner, then enable
>> these resources in self-hosted action.
>>
>> *## Test on self-hosted Github Action with custom runner:*
>> Here is some tries on my local repo:
>> 1. Spark Maven/SBT test:
>> PR: https://github.com/apache/spark/pull/35088
>> TEST: https://github.com/Yikun/spark/pull/51
>> 2. PySpark test:
>> PR: https://github.com/apache/spark/pull/35049
>> TEST: https://github.com/Yikun/spark/pull/53
>> 3. Pull request test on unallow user:
>> TEST: https://github.com/Yikun/spark/pull/60
>> The self-hosted runner will prevent the PR access the runner due to
>> "Running job on worker spark-github-runner-0001 disallowed by security
>> policy".
>>
>> *## Pros of self-hosted github aciton:*
>> - Satisfy the simple demands of Linux arm64 sheduled jobs.
>> - Reuse the main workflow of github action.
>> - All changes are visible on github is easy to review.
>> - Easy to migrate when official GA arm64 support ready.
>>
>> *## What's the next step:*
>> * If we can also consider self-hosted action as optional, I will submit a
>> JIRA on Apache Infra to request the token to continue, like:
>> https://issues.apache.org/jira/browse/INFRA-21305
>> * If we certainly think that self-hosted action is not a wise choice, I
>> will try to find other way.
>>
>> There are also some initial discusson, just FYI:
>> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/pull/6
>>
>> Regards,
>> Yikun
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to