Thanks QP. This seems reasonable to me. On Sun, Aug 1, 2021, 3:24 PM QP Hou <houqp....@gmail.com> wrote:
> Summarizing the discussed proposal in our Github issue [1] for broader > discussion and review on the dev list. > > The current arrow-datafusion repo contains the following high level > subprojects: datafusion, datafusion python binding and ballista. > > In order to be able to release ballista and datafusion python binding > with semantic versioning, I propose we decouple subproject versions > from each other. As a result, we will be able to release a breaking > change in datafusion without forcing a major version bump in ballista > or python binding if that breaking change is not visible to their > consumers. > > To reduce release overhead, we will still vote on the whole > arrow-datafusion repo on every release. From the same release tarball, > we can then release these sub-projects to their language specific > registries (crates.io and pypi) with their own versions. > > Take the upcoming datafusion 5.0.0 release as an example. Within the > same source release, we also have the code for ballista-0.5.0 and > datafusion-python-0.3.0. We only need to vote on a signed > apache-arrow-datafusion-5.0.0.tar.gz tarball. > > Consequence of this process is every time we need to release a new > version of the python binding or ballista, we need to trigger a new > datafusion release as well. However, datafusion release won't require > a new release from the other two subprojects. For example, datafusion > 5.1.0 release can just include a datafusion python release 0.4.0 > without a ballista release. In that case, we will just skip crates.io > publish for ballista. > > Here is what the release process will look like: > > * Send a PR with the following changes to prepare the source tree for > a new release: > - Update versions in Cargo.toml files > - Run automation script to generate > {datafusion,python,ballista}/CHANGELOG.md > * After PR gets merged, push git tag x.y.z to Github > * Run dev/release/create-tarball.sh to create and upload a signed > tarball for voting in the dev list > * After vote passed, run ./dev/release/release-tarball.sh to move > approved tarball to the release location in SVN > * Unpack released tarball and release subproject to language specific > registries: > - run `cargo publish` in datafusion to release datafusion to crates.io > - if there is a new ballista release > - run `cargo publish` in > ballista/rust/{client,core,executor,scheduler} folders to release > ballista to crates.io > - push `ballista-x.y.z` tag to Github > - if there is a new datafusion python release > - run `maturin publish` in python folder to release datafusion > python binding to pypi > - release python documentation > - push `python-x.y.z` tag to Github > > I would like to get some feedback on this proposal since it is a > little bit different from other Arrow projects. But I do think this > will provide a bitter dependency pinning experience and changelog > tracking for those sub-projects' downstream consumers. > > [1]: https://github.com/apache/arrow-datafusion/issues/771 > > > On Tue, Jul 27, 2021 at 4:18 PM Andrew Lamb <al...@influxdata.com> wrote: > > > > Thanks to you both -- this sounds great. > > > > On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu <jimex...@gmail.com> wrote: > > > > > Not sure it's necessarily bundled together but I believe a Python, > > > documentation, etc. release can also be helpful. I can volunteer to > help if > > > somehow these works can be parallelized. > > > > > > On Tue, Jul 27, 2021 at 3:29 PM QP Hou <houqp....@gmail.com> wrote: > > > > > > > Following up on this, since delta-rs could really benefit from this > > > > release, I have started some initial work with > > > > https://github.com/apache/arrow-datafusion/pull/780 to move things > > > > forward. Others are welcome to join the party. > > > > > > > > On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <al...@influxdata.com> > > > wrote: > > > > > > > > > > Does anyone want to make a DataFusion / Ballista official release > (and > > > > then > > > > > subsequent release to crates.io)? There is now a ticket [1] to > track > > > > this > > > > > work. I think it would be great to do if someone has time. There > are > > > all > > > > > sorts of great features that have gone in since 4.0.0 > > > > > > > > > > I don't have much time to devote to the release management of > > > DataFusion > > > > / > > > > > Ballista in the near term (as my project uses DataFusion master > and my > > > > > release management budget is already spent on managing arrow-rs > > > > releases). > > > > > > > > > > Andrew > > > > > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/771 > > > > > > > >