Summarizing the discussed proposal in our Github issue [1] for broader discussion and review on the dev list.
The current arrow-datafusion repo contains the following high level subprojects: datafusion, datafusion python binding and ballista. In order to be able to release ballista and datafusion python binding with semantic versioning, I propose we decouple subproject versions from each other. As a result, we will be able to release a breaking change in datafusion without forcing a major version bump in ballista or python binding if that breaking change is not visible to their consumers. To reduce release overhead, we will still vote on the whole arrow-datafusion repo on every release. From the same release tarball, we can then release these sub-projects to their language specific registries (crates.io and pypi) with their own versions. Take the upcoming datafusion 5.0.0 release as an example. Within the same source release, we also have the code for ballista-0.5.0 and datafusion-python-0.3.0. We only need to vote on a signed apache-arrow-datafusion-5.0.0.tar.gz tarball. Consequence of this process is every time we need to release a new version of the python binding or ballista, we need to trigger a new datafusion release as well. However, datafusion release won't require a new release from the other two subprojects. For example, datafusion 5.1.0 release can just include a datafusion python release 0.4.0 without a ballista release. In that case, we will just skip crates.io publish for ballista. Here is what the release process will look like: * Send a PR with the following changes to prepare the source tree for a new release: - Update versions in Cargo.toml files - Run automation script to generate {datafusion,python,ballista}/CHANGELOG.md * After PR gets merged, push git tag x.y.z to Github * Run dev/release/create-tarball.sh to create and upload a signed tarball for voting in the dev list * After vote passed, run ./dev/release/release-tarball.sh to move approved tarball to the release location in SVN * Unpack released tarball and release subproject to language specific registries: - run `cargo publish` in datafusion to release datafusion to crates.io - if there is a new ballista release - run `cargo publish` in ballista/rust/{client,core,executor,scheduler} folders to release ballista to crates.io - push `ballista-x.y.z` tag to Github - if there is a new datafusion python release - run `maturin publish` in python folder to release datafusion python binding to pypi - release python documentation - push `python-x.y.z` tag to Github I would like to get some feedback on this proposal since it is a little bit different from other Arrow projects. But I do think this will provide a bitter dependency pinning experience and changelog tracking for those sub-projects' downstream consumers. [1]: https://github.com/apache/arrow-datafusion/issues/771 On Tue, Jul 27, 2021 at 4:18 PM Andrew Lamb <al...@influxdata.com> wrote: > > Thanks to you both -- this sounds great. > > On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu <jimex...@gmail.com> wrote: > > > Not sure it's necessarily bundled together but I believe a Python, > > documentation, etc. release can also be helpful. I can volunteer to help if > > somehow these works can be parallelized. > > > > On Tue, Jul 27, 2021 at 3:29 PM QP Hou <houqp....@gmail.com> wrote: > > > > > Following up on this, since delta-rs could really benefit from this > > > release, I have started some initial work with > > > https://github.com/apache/arrow-datafusion/pull/780 to move things > > > forward. Others are welcome to join the party. > > > > > > On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <al...@influxdata.com> > > wrote: > > > > > > > > Does anyone want to make a DataFusion / Ballista official release (and > > > then > > > > subsequent release to crates.io)? There is now a ticket [1] to track > > > this > > > > work. I think it would be great to do if someone has time. There are > > all > > > > sorts of great features that have gone in since 4.0.0 > > > > > > > > I don't have much time to devote to the release management of > > DataFusion > > > / > > > > Ballista in the near term (as my project uses DataFusion master and my > > > > release management budget is already spent on managing arrow-rs > > > releases). > > > > > > > > Andrew > > > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/771 > > > > >