Hi,

Thanks Raúl for bringing this up since it's an important topic!
I'd like to provide more context for your proposal and share my
particular problems with the release process.

On Mon, May 9, 2022 at 2:33 PM Raul Cumplido <r...@voltrondata.com> wrote:
>
> Hi,
>
> I would like to propose a change in our release process.
>
> The rationale for the change is to avoid introducing new issues once a
> Release Candidate has already been cut by only merging specific commits to
> new release candidates.
>
> Currently once a new Release Candidate is required we drop the previous
> version branch and create a new Release Candidate from the master branch on
> the repository [1].
Actually dropping the previous "release-<version>" branch is not a
requirement, but it's indeed not clearly documented in the release
guidelines.

> This has the problem that we might introduce new bugs
> to the Release creating the need of cutting further release candidates.
We introduced the release branches for this exact scenario, so we can
create releases independently from the master branch.

> As an example, for the release 7.0.0, 10 release candidates were required
The reason for the notorious 7.0.0-RC10 is different, more on that later.

> and for the release 8.0.0 there was the need to remove a specific commit that
> introduced some new issues [2]. For the release 8.0.0 we were able to find
> it early but it could have potentially been introduced and created the need
> for further RCs.
>
> I would like to propose the following workflow.
> When creating the initial RC, create both an rc1 branch and the version
> branch from master.
> release-x.0.0 and release-x.0.0.rc1
>
> If a new RC is required, drop the release-x.0.0 (as we do today) and create
> a new RC branch from the previous RC branch (instead of master), then
> cherry pick only the specific commits that have been identified to be part
> of the new release candidate. We can automate the cherrypick process via a
> script specifying the JIRA tickets or the commit hashes that we want to add
> to the new release candidate. Once the new RC branch is ready, create a new
> version branch from it and proceed as today.
This is why I manually cherry-picked 4 commits from the master branch
to the new release branch [1] excluding that specific patch.
Note, that there was a single blocker [2], but I still included 3
additional patches: 2 low-risk bug fixes and a patch for the
verification.

> The commits to be added to the release once a release candidate has already
> been cut will usually be fixes for the release but could also be features
> if there is community consensus that a feature must be introduced to the
> release.
I'd have also included both the python UDF [3] and GCS [4] patches
since they are really valuable features.
In the first case we noticed the broken packaging builds from the
nightly report, this is why I had to cherry-pick commits from the
master rather than cutting RC3 directly from the master branch (there
is no other difference).
In the second case the PR simply didn't make it due to the same reason
[5] which we managed to catch before merging the patch.

> This change will allow us to have a more granular control of what goes in
> the release once a release candidate has been cut and speed up the release
Since your proposal is already implemented, the actionable item I see
here is to properly document it in the release management guidelines.

> by focusing both the release manager's and the community's efforts and
> potentially reducing the number of RCs to be created and verified.
Regarding the notorious 7.0.0-RC10 release candidate: I developed a
habit to execute the source verification tasks before calling a vote
while waiting for the packaging builds to finish. If there is an issue
it doesn't reach the VOTE phase. Just took a look and the 6th release
candidate (7.0.0-RC5) was the first one I managed to send out a VOTE
email for. Out of the 11 release candidates I created for the 7.0.0
release only 4 made it until the voting.

Before that release the number of RC verification crossbow tasks kept
growing but without the ability to run them on a nightly basis.
Meaning that we were unable to tell whether the verification tasks
will pass for a certain commit and just noticing issues after creating
a release candidate.
Right after the 7.0.0 release we refactored [6] the source
verification scripts and crossbow tasks to support verifying specific
git commits, local checkouts and actual release candidates. Since then
we have nightly verification builds so we get notified about the
failing builds and haven't even tried to create the first release
candidate until we had failing verification tasks. This was the single
reason why we didn't have 10+ release candidates this time.


After spending countless sleepless nights with arrow releases I'd like
to raise awareness of three other problems bothering me:

PROBLEM 1: Rush period before the release:
One or two weeks before the release we start to incrementally postpone
the issues which are unlikely to make it into the release but there
are features we would still like to squeeze in. There are too many
simultaneously moving parts right before the release, possibly
introducing new issues. Since we release many implementations at once
and there are multiple stakeholders focusing on different features
it's generally hard to "reach consensus" about what to exclude and
what to wait for. We're trying our best to include as much value to
each release as we can while trying to avoid significant delays in
delivery date.

PROBLEM 2: Decoupled packaging and verification builds
Due to the on-demand nature of the crossbow tasks we often forget to
trigger crossbow builds before merging a PR resulting in nightly
failures which we need to fix in follow-up PRs. Ideally if we were
able to run all of our builds on all of the PRs before merging we
could keep the master branch in an always-relasable state.
This is a tradeoff we made to spare CI resources for the apache/arrow
repository but soon enough we will reach the capacity limits of
crossbow as well (for example I had to manually stop-and-restart macOS
crossbow builds during the release process to avoid waiting 12 more
hours).

PROBLEM 3: Lack of interest in nightly builds despite their importance
We usually let nightly builds to continuously fail for days or even
weeks hiding more and more issues over time. This adds up before the
release making the rush period even worse. I'm not sure what's the
exact reason, probably the mixture of just a few subscribers to the
builds@ mailing list and the poor readability of nightly reports
(which keeps improving thanks to Raúl).

Thanks, Krisztian

[1]: https://github.com/apache/arrow/commits/release-8.0.0
[2]: 
https://github.com/apache/arrow/commit/0d30a05212b1448f53233f2ab325924311d76e54
[3]: https://github.com/apache/arrow/pull/12590
[4]: https://github.com/apache/arrow/pull/12763
[5]: https://github.com/apache/arrow/pull/12763#issuecomment-1109022291
[6]: https://github.com/apache/arrow/pull/12320
>
> Thanks,
> Raúl
>
> [1]
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> [2] https://github.com/apache/arrow/pull/12590#issuecomment-1116144088

Reply via email to