On Mon, Apr 12, 2021 at 4:44 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> hi Krisztian,
>
> On Mon, Apr 12, 2021 at 8:41 AM Krisztián Szűcs
> <szucs.kriszt...@gmail.com> wrote:
> >
> > Hi,
> >
> > Based on the google document I see one actual problem and two actions
> > not explicitly solving the real issue.
> >
> > # Issue: Decouple release process to enable independent releases
> >
> > This is something the whole project requires, not just the rust
> > implementation. Eventually every implementation should have its own
> > release cycle not blocked on others.
> > Even if we want to tackle this just for the rust implementation in the
> > first iteration, we need to figure out the right versioning scheme,
> > voting process and integration testing. Note that we have already
> > decided to decouple the source release process from the release of the
> > binaries. We also need to figure out how to handle the source release
> > itself once we provide different artifacts for different
> > implementations.
>
> I don't think it's necessary to figure out all this right now. For
> now, this seems to be a Rust-specific issue.
>
> > # Action 1: Maintain the rust implementation in additional apache 
> > repositories
> >
> > Pros:
> > - presumably would make it easier to define the dependencies between
> > the rust crates (though cargo should support multiple crates in a
> > single repository)
> > Cons:
> > - inconsistent: if there is apache/arrow-rs why isn't there apache/arrow-js
>
> I don't think this matters, nor will anyone actually care in practice.
> If someone google searches for "Arrow Rust" or "Arrow JavaScript",
> they'll find what they need.
>
> > - decrease project visibility: having all of the implementations in a
> > single repository makes other implementations trivial to find
>
> As long as things are documented clearly in READMEs and the READMEs
> have links to each other, I think this is okay.
>
> > - introduces a lot of complexity for the CI/CD processes, and not just
> > for the rust folks
>
> Could you explain this? In apache/arrow, we're basically just deleting
> stuff from our CI/CD configurations. In integration testing, we would
> have an additional git checkout phase to add a pinned version of Rust
> to the integration tests.
We still need to pay additional maintenance cost to:
- keep the version pins up to date
- periodically refresh the rust build scripts as the project evolve
- integration test should be exercised on the rust side as well either
by using archery from apache/arrow or by custom scripting

Sharing common tooling will be harder between separate repositories,
integrating with 3rdparty services (like a new CI) will involve more
round trips with the INFRA team.
> > - will make harder to interface/link between different implementations
> >
> > This seems unreasonable to me.
>
> It seems we won't be able to satisfy every requirement. Creating
> interdependent projects will have more work, but we will have to
> create tools to facilitate this if/when it becomes a concern.
>
> >
> > # Action 2: Use github for issue tracking
> >
> > Pros:
> > - easier for new contributors
> > - more flexible in certain ways
> > Cons:
> > - not the apache way of issue tracking
>
> This isn't true — several other Apache projects use GitHub issues.
>
> > - doubtful outcome for large number of issues
>
> This isn't each of our problem to solve — if the Rust projects become
> disorganized in their issues, we can bring it up on the mailing list
> and discuss remedies in the future.
>
> >
> > I don't like either JIRA but I can live with it, though I understand
> > the frustration around it.
> > Since GH issues vs. JIRA seems like a hot topic lately we could try to
> > experiment with a less radical change: enable github issues for the
> > whole project and sync them to JIRA (either by using an existing
> > service or by developing a github action for it). We may end up
> > preferring github issues eventually.
>
> This seems like a can of worms. I think that there is a cultural
> expectation in the Rust community for individual crates to have their
> own respective GitHub issues. So this change is allowing for that.
>
> I don't see a need to change our issue management in the rest of the
> project. The C++ project and its dependents behave increasingly like
> an "enterprise" project in its development culture where the more
> structured Jira approach is a good fit.
>
> >
> > All in all, I find this proposal way too invasive. It sounds more like
> > starting a new project with its own governance rather than making
> > releases more accessible to users.
>
> I'm taking a laissez-faire attitude here — if the Rust developers want
> to implement this change, I'm happy for them go ahead and do it. Since
> it is almost strictly subtractive to apache/arrow, it should not
> create extra burdens for non-Rust developers.
>
> In general, the nature of the conflict that we've been having is one
> programming language forcing conformity on another. Just as we're
> saying to let Rust adopt its cultural norms, there shouldn't be an
> expectation that other parts of Apache Arrow should be conforming to
> things from the Rust ecosystem.
>
> Regarding governance: there are no governance changes.
>
> * Committers and PMC members still have to be approved in the same way
> * The Arrow PMC will vote on releases
>
> In Parquet, for example we have the parquet-mr and parquet-format
> repositories, which release separately but share a common
> committership and PMC.
>
> >
> > Thanks, Krisztian
> >
> >
> > On Fri, Apr 9, 2021 at 5:18 PM Andy Grove <andygrov...@gmail.com> wrote:
> > >
> > > Following on from the email thread "Rust sync meeting" I would like to
> > > start a new discussion about moving the Rust components out to new GitHub
> > > repositories and using a new process for issues and release management.
> > >
> > > I have started a Google document [1] with details and to track the work
> > > required for this effort but I will summarize the key points of the
> > > proposal here:
> > >
> > >
> > >    -
> > >
> > >    Move existing Rust code into two new repositories
> > >    -
> > >
> > >       apache/arrow-rs
> > >       -
> > >
> > >          Arrow + Parquet crates
> > >          -
> > >
> > >       apache/datafusion
> > >       -
> > >
> > >          DataFusion + Ballista crates (which are expected to merge to some
> > >          degree over time)
> > >          -
> > >
> > >          TPC-H benchmarks
> > >          -
> > >
> > >       Use GitHub issues for issue tracking
> > >       -
> > >
> > >    Decouple release process
> > >    -
> > >
> > >       Crates are released individually
> > >       -
> > >
> > >       A vote on the source release of the released crate is held over the
> > >       mailing list as usual.
> > >       -
> > >
> > >       Rust does not need to release a new version when the rest of Arrow
> > >       releases; we bundle our latest released crates to the signed tar.
> > >       -
> > >
> > >       Crates can depend on GitHub commit hashes between releases
> > >
> > >
> > > The Google document may be the best place to collaborate on the proposal
> > > but I can update the document based on any comments in this email thread 
> > > as
> > > well.
> > >
> > > Note that I have excluded discussion about arrow2/parquet2 from this
> > > proposal and I believe we should discuss that separately as a follow-on
> > > discussion.
> > >
> > > I look forward to hearing opinions on this both from current Rust
> > > maintainers and contributors and also from the wider Arrow community.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > [1]
> > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing

Reply via email to