Thanks, Andy. Two areas of concern I think we should have some answer for
before going forward with this (and I make no opinions as to what the
"right" answers are, just raising them for discussion):

1. Integration testing: what is our workflow for ensuring that our
implementations are integration tested, and what do we do when changes
(whether in apache/arrow or in apache/arrow-rs) introduce
regressions/failures? I'm assuming the idea is that the existing
integration tests will remain in apache/arrow. Will you also run the
integration test suites on your rust repository CI checks?
2. Versioning: one rationale from our current policy of "everyone releases
together" is that you don't have to guess as much whether (for example)
Arrow Java 3.0 and Arrow Rust 3.0 are compatible and using the same format.
It's kind of a heuristic for what library versions were integration tested
with each other. It sounds like (but maybe I misunderstand) that y'all are
looking to break from that. But if Arrow C++ goes to version 7.0 by the end
of the year and arrow-rs chooses to go to 15.4, or 3.12, or whatever, does
that create confusion or doubt that works against the Arrow goal of easy
interoperability?

Neal

On Fri, Apr 9, 2021 at 8:18 AM Andy Grove <andygrov...@gmail.com> wrote:

> Following on from the email thread "Rust sync meeting" I would like to
> start a new discussion about moving the Rust components out to new GitHub
> repositories and using a new process for issues and release management.
>
> I have started a Google document [1] with details and to track the work
> required for this effort but I will summarize the key points of the
> proposal here:
>
>
>    -
>
>    Move existing Rust code into two new repositories
>    -
>
>       apache/arrow-rs
>       -
>
>          Arrow + Parquet crates
>          -
>
>       apache/datafusion
>       -
>
>          DataFusion + Ballista crates (which are expected to merge to some
>          degree over time)
>          -
>
>          TPC-H benchmarks
>          -
>
>       Use GitHub issues for issue tracking
>       -
>
>    Decouple release process
>    -
>
>       Crates are released individually
>       -
>
>       A vote on the source release of the released crate is held over the
>       mailing list as usual.
>       -
>
>       Rust does not need to release a new version when the rest of Arrow
>       releases; we bundle our latest released crates to the signed tar.
>       -
>
>       Crates can depend on GitHub commit hashes between releases
>
>
> The Google document may be the best place to collaborate on the proposal
> but I can update the document based on any comments in this email thread as
> well.
>
> Note that I have excluded discussion about arrow2/parquet2 from this
> proposal and I believe we should discuss that separately as a follow-on
> discussion.
>
> I look forward to hearing opinions on this both from current Rust
> maintainers and contributors and also from the wider Arrow community.
>
> Thanks,
>
> Andy.
>
> [1]
>
> https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing
>

Reply via email to