After finishing the 4.0.0-preview1 RC1, I have more experience with this
topic now.

In fact, the main job of the release process: building packages and
documents, is tested in Github Action jobs. However, the way we test them
is different from what we do in the release scripts.

1. the execution environment is different:
The release scripts define the execution environment with this Dockerfile:
https://github.com/apache/spark/blob/master/dev/create-release/spark-rm/Dockerfile
However, Github Action jobs use a different Dockerfile:
https://github.com/apache/spark/blob/master/dev/infra/Dockerfile
We should figure out a way to unify it. The docker image for the release
process needs to set up more things so it may not be viable to use a single
Dockerfile for both.

2. the execution code is different. Use building documents as an example:
The release scripts:
https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L404-L411
The Github Action job:
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L883-L895
I don't know which one is more correct, but we should definitely unify them.

It's better if we can run the release scripts as Github Action jobs, but I
think it's more important to do the unification now.

Thanks,
Wenchen


On Fri, May 10, 2024 at 12:34 AM Hussein Awala <huss...@awala.fr> wrote:

> Hello,
>
> I can answer some of your common questions with other Apache projects.
>
> > Who currently has permissions for Github actions? Is there a specific
> owner for that today or a different volunteer each time?
>
> The Apache organization owns Github Actions, and committers (contributors
> with write permissions) can retrigger/cancel a Github Actions workflow, but
> Github Actions runners are managed by the Apache infra team.
>
> > What are the current limits of GitHub Actions, who set them - and what
> is the process to change those (if possible at all, but I presume not all
> Apache projects have the same limits)?
>
> For limits, I don't think there is any significant limit, especially since
> the Apache organization has 900 donated runners used by its projects, and
> there is an initiative from the Infra team to add self-hosted runners
> running on Kubernetes (document
> <https://cwiki.apache.org/confluence/display/INFRA/ASF+Infra+provided+self-hosted+runners>
> ).
>
> > Where should the artifacts be stored?
>
> Usually, we use Maven for jars, DockerHub for Docker images, and Github
> cache for workflow cache. But we can use Github artifacts to store any kind
> of package (even Docker images in the ghcr), which is fully accepted by
> Apache policies. Also if the project has a cloud account (AWS, GCP, Azure,
> ...), a bucket can be used to store some of the packages.
>
>
>  > Who should be permitted to sign a version - and what is the process for
> that?
>
> The Apache documentation is clear about this, by default only PMC members
> can be release managers, but we can contact the infra team to add one of
> the committers as a release manager (document
> <https://infra.apache.org/release-publishing.html#releasemanager>). The
> process of creating a new version is described in this document
> <https://www.apache.org/legal/release-policy.html#policy>.
>
>
> On Thu, May 9, 2024 at 10:45 AM Nimrod Ofek <ofek.nim...@gmail.com> wrote:
>
>> Following the conversation started with Spark 4.0.0 release, this is a
>> thread to discuss improvements to our release processes.
>>
>> I'll Start by raising some questions that probably should have answers to
>> start the discussion:
>>
>>
>>    1. What is currently running in GitHub Actions?
>>    2. Who currently has permissions for Github actions? Is there a
>>    specific owner for that today or a different volunteer each time?
>>    3. What are the current limits of GitHub Actions, who set them - and
>>    what is the process to change those (if possible at all, but I presume not
>>    all Apache projects have the same limits)?
>>    4. What versions should we support as an output for the build?
>>    5. Where should the artifacts be stored?
>>    6. What should be the output? only tar or also a docker image
>>    published somewhere?
>>    7. Do we want to have a release on fixed dates or a manual release
>>    upon request?
>>    8. Who should be permitted to sign a version - and what is the
>>    process for that?
>>
>>
>> Thanks!
>> Nimrod
>>
>

Reply via email to