Hi,

> I can see two ways to approach this issue that are both based on extending
> our PR/commit title format with machine-readable information about the
> nature of the changes in the commit. We could take inspiration from [4] or
> something similar. This would allow us to easily update the overall version
> according to the actual semver spec[5] without requiring huge amounts of
> manual effort between releases to keep that info current. Additionally, it
> would also be possible to create an accurate semver for each component
> (e.g. pyarrow to alleviate issues brought up in [3]) as our title already
> contains component information. We already have a rather strict policy on
> commit titles, so I don't think it would impose too much additional work to
> add and verify this semver marker for the benefit it would bring us.

Thanks for bringing up the interesting idea.

Regarding separated version for each component:

  * We need to update our release scripts
    * We update version information by
      dev/release/01-prepare.sh and
      dev/release/post-12-bump-versions.sh on each release
  * We reconsider tag name
    * We always use apache-arrow-X.Y.Z
  * PyArrow's version depends on the C++ implementation
    * If PyArrow doesn't have any breaking change but the
      C++ implementation has a breaking change, PyArrow
      should bump major version.

How about only adding "breaking change" information to each
commit as the first step? We can reconsider major release
frequency with the information.


Thanks,
-- 
kou

In <canva0dj_c5bmmxyspdkd7wmndz3ecqxn6idagrzq4jrjt78...@mail.gmail.com>
  "Re: [DISC][Release] More control on Release Candidates commits" on Fri, 20 
May 2022 16:36:40 +0200,
  Jacob Wujciak <ja...@voltrondata.com> wrote:

> Hello Everyone,
> 
> --- Summary
> +1 (nb) on formalized release workflow with "feature-freeze"  on the
> release branch (master can receive PRs as usual) and early & frequent
> automated reminders.
> 
> -1 (nb) on bi-monthly releases in the short term due to limitations with
> CI. Solution: self-hosted ephemeral, auto-scaling runners but this requires
> planning, development, and long-term maintenance. (see [1][2])
> +1 (nb) on faster releases as a general idea but with changes to arrows
> versioning scheme -> provide machine-readable semver info in commit titles
> and use this to create an accurate semver for each release, possibly for
> each component (see [3] for user issues with major-only).
> ---
> 
> I do like the idea of earlier, more frequent (automated) reminders for the
> "feature-freeze" on the release branch, while work can continue unhindered
> on master) as proposed by Kou. As Krisztian's existing workflow was pretty
> similar to the proposal, I think it is more a case of formalizing and
> documenting the process so that it is transparent for all contributors
> (especially the "feature-freeze" aspect).
> 
> The idea to increase the release frequency so people don't feel that they
> missed out on shipping their new feature is interesting and I am not
> opposed to it in principle but I do not think it is something that we can
> implement in the short term without substantial negative impact on everyone
> involved in releases and requires more discussion and planning.
> 
> On the one hand, we have technical issues. We simply do not have the CI
> resources at the moment to run the number of jobs on PRs that would be
> required to ensure that master is releasable at basically any point in
> time. The Apache Github org can have 180 concurrent jobs spread over all
> repositories, in practice, this means we get ~ 5 active jobs sometimes less
> or none  (depending on the time of day but not much more). This problem has
> been discussed often and as far back as the initial release of GHA [1].
> Everyone has experienced this problem only getting worse in recent times.
> 
> There are two ways to address this: reduce run times by optimizing
> workflows e.g. through caching but as pointed out in [1] this is not an
> actual solution as ASF has no way to assign runner quotas to repositories,
> so capacity we free by optimizing our workflows could very well be used by
> some other project leaving us with the issue unchanged. The real solution
> is to use self-hosted runners, which due to the nature of PR CI (running
> external code) are a security issue when attached to a public repo like
> arrow. While INFRA does allow self-hosted runners their safe and effective
> utilization requires planning, development, continued maintenance, and
> financial support (In [2] Jarek Potiuk talks about the system they use in
> apache/airflow). We should definitely strive to implement (ideally:
> ephemeral, auto-scaling) self-hosted runners as soon as possible and I am
> already working on a proposal for such a system but this is just not
> something we can achieve in a month or two.
> 
> On the other hand, there are also ecosystem/downstream consequences that
> should be considered when discussing a bi-monthly release. There are
> already concerns about our releases only incrementing major versions (some
> discussion in [3]) which would become more severe with an even higher
> frequency of major releases. I do understand the difficulty of making sure
> there are no breaking changes in any of the components of the mono repo and
> thus being safe with major releases (which is one of the main reasons for
> major-only releases afaik?) but with bi-monthly (or monthly once the
> process is perfected?) releases we would be nearing Chrome territory pretty
> quickly version wise.
> 
> I can see two ways to approach this issue that are both based on extending
> our PR/commit title format with machine-readable information about the
> nature of the changes in the commit. We could take inspiration from [4] or
> something similar. This would allow us to easily update the overall version
> according to the actual semver spec[5] without requiring huge amounts of
> manual effort between releases to keep that info current. Additionally, it
> would also be possible to create an accurate semver for each component
> (e.g. pyarrow to alleviate issues brought up in [3]) as our title already
> contains component information. We already have a rather strict policy on
> commit titles, so I don't think it would impose too much additional work to
> add and verify this semver marker for the benefit it would bring us.
> 
> Best,
> Jacob
> 
> [1]:
> https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status
> [2]:
> https://issues.apache.org/jira/browse/INFRA-21646?focusedCommentId=17316108&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17316108
> [3]: https://github.com/apache/arrow/issues/13185
> [4]: https://www.conventionalcommits.org/en/v1.0.0/
> [5]: https://semver.org
> 
> 
> 
> On Wed, May 11, 2022 at 12:28 PM Krisztián Szűcs <szucs.kriszt...@gmail.com>
> wrote:
> 
>> On Wed, May 11, 2022 at 6:01 AM Sutou Kouhei <k...@clear-code.com> wrote:
>> >
>> > Hi,
>> >
>> > In <CAGvDy=o3ohs5lfhqvre3r0mo1if7o908v6wd_oy0aqv5sdt...@mail.gmail.com>
>> >   "Re: [DISC][Release] More control on Release Candidates commits" on
>> Tue, 10 May 2022 13:27:09 +0200,
>> >   Raul Cumplido <r...@voltrondata.com> wrote:
>> >
>> > > I still think there is some value in standardising the "feature
>> freeze" on
>> > > new release candidates once a first release candidate has been created
>> and
>> > > only add required fixes for the follow up RCs. What I would like to
>> avoid
>> > > with that is rushing big new features at the end that might be added
>> > > between release candidates.
>> > >
>> > >>> PROBLEM 1: Rush period before the release:
>> > >>
>> > > The only proposal I can think of around this is that I will try and
>> share
>> > > the release schedule earlier. I sent an email [2] with a ~1.5/~2 weeks
>> > > notice, maybe if all of us start being more aware that a release is
>> coming
>> > > with a little more time (1 month) we can plan better.
>> >
>> > How about releasing more frequently? If we release a new
>> > version frequently, stakeholders will be able to wait for
>> > future releases instead of pushing a new feature to the next
>> > release.
>> > If we release a new version in the middle of even
>> > months, how about the following schedule?
>>
>> That looks like a plan!
>>
>> I had a similar idea to be stricter about the release dates + a
>> feature freeze period, but wasn't thinking of more frequent releases
>> which is a nice trade-off.
>>
>> > 1. Set release target date (e.g. 2022-08-10/20 for 9.0.0)
>> > 2. Notice release target date at 2022-07-01
>> >    (We can automate this.)
>> We may need more notices 2 weeks and 1 week before the feature freeze.
>>
>> > 3. Create a release branch for the next version (release-9.0.0) and
>> >    use the next next version (10.0.0) as the default version
>> >    for dev/merge_arrow_pr.py at 2022-08-01
>> >    (We can automate this.)
>> > 4. Stabilize release-9.0.0 branch in 2022-08-01/10
>> >    ("feature freeze")
>> >    (We should verify this branch by our nightly CI.)
>> We should move the crossbow job triggers and reports to apache/arrow
>> so we have a tighter control over the nightlies.
>>
>> > 5. Vote and release 9.0.0 in 2022-08-10/20
>> > 6. Set release target date (e.g. 2022-10-10/20 for 10.0.0)
>> > 7. ...
>> >
>> >
>> > Thanks,
>> > --
>> > kou
>>

Reply via email to