Hello Everyone!

I would like to resurface the discussion of separate
versioning/releases/voting for monorepo components. We have previously
touched on this topic mostly in the community meetings and spread across
multiple, only tangential related threads. I think a focused discussion can
be a bit more results oriented, especially now that we almost regularly
deviate from the quarterly release cadence with minor releases. My hope is
that discussing this and adapting our process can lower the amount of work
required and ease the pressure on our release managers (Thank you Raúl and
Kou!).

I think the base of the topic is the separate versioning for components as
otherwise separate releases only have limited value. From a technical
perspective standalone implementations like Go or JS are the easiest to
handle in that regard, they can just follow their ecosystem standards,
which has been requested by users already (major releases in Go require
manual editing across a code base as dependencies are usually pinned to a
major version).

For Arrow C++ bindings like Arrow R and PyArrow having distinct versions
would require additional work to both enable the use of different versions
and ensure version compatibility is monitored and potentially updated if
needed.

For Arrow R we have already implemented these changes for different reasons
and have backwards compatibility with  libarrow >= 13.0.0. From a user
standpoint of PyArrow this is likely irrelevant as most users get binary
wheels from pypi, if a user regularly builds PyArrow from source they are
also capable of managing potentially different libarrow version
requirements as this is already necessary to build the package just with an
exact version match.

A more meta question is about the messaging that different versioning
schemes carry, as it might no longer be obvious on first glance which
versions are compatible or have the newest features. Though I would argue
that this  a marginal concern at best as there is no guarantee of feature
parity between different components with the same version. Breaking that
implicit expectation with separate versions could be seen as clearer. If a
component only receives dependency bumps or minor bug fixes, releasing this
component with a patch version aligns much better with expectations than a
major version bump. In addition there are already several differently
versioned libraries in the apache/arrow-* ecosystem that are released
outside of the monorepo release process.  A proper support policy for each
component would also be required but could just default to 'current major
release' as it is now.

>From an ASF perspective there is no requirement to release the entire
repository at once as the actual release artifact is the source tarball. As
long as that is verified and voted on by the PMC it is an official release.

This brings me to the release process and voting. I think it is pretty
clear that completely decoupling all components and their release processes
isn't feasible at the moment, mainly from a technical perspective
(crossbow) and would likely also lead to vote fatigue. We have made efforts
to ease the verification required for the vote easier and will continue
these efforts. Though I can see some of the components managing their own
releases (e.g. R, as we do with post release tasks already due to CRAN, ) a
continued quarterly 'batch release' seems like a more appealing solution
and would still allow us to use separate versions.  Voting in one thread on
all components/a subset of components per voter and the surrounding
technicalities is something I would like to hear some opinions on.

In my opinion being stricter with release requirements for components might
lead to  smaller/less active components not releasing. This seems like a
bad thing at first glance but might also spur the user community to get
involved when the reassuring, regular releases dry up and reflect the
reality of the development situation of the component.

I am eager to hear your thoughts!

Best
Jacob

Reply via email to