On Mon, Apr 12, 2021 at 10:12 AM Krisztián Szűcs
<szucs.kriszt...@gmail.com> wrote:
>
> On Mon, Apr 12, 2021 at 4:44 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > hi Krisztian,
> >
> > On Mon, Apr 12, 2021 at 8:41 AM Krisztián Szűcs
> > <szucs.kriszt...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Based on the google document I see one actual problem and two actions
> > > not explicitly solving the real issue.
> > >
> > > # Issue: Decouple release process to enable independent releases
> > >
> > > This is something the whole project requires, not just the rust
> > > implementation. Eventually every implementation should have its own
> > > release cycle not blocked on others.
> > > Even if we want to tackle this just for the rust implementation in the
> > > first iteration, we need to figure out the right versioning scheme,
> > > voting process and integration testing. Note that we have already
> > > decided to decouple the source release process from the release of the
> > > binaries. We also need to figure out how to handle the source release
> > > itself once we provide different artifacts for different
> > > implementations.
> >
> > I don't think it's necessary to figure out all this right now. For
> > now, this seems to be a Rust-specific issue.
> >
> > > # Action 1: Maintain the rust implementation in additional apache 
> > > repositories
> > >
> > > Pros:
> > > - presumably would make it easier to define the dependencies between
> > > the rust crates (though cargo should support multiple crates in a
> > > single repository)
> > > Cons:
> > > - inconsistent: if there is apache/arrow-rs why isn't there 
> > > apache/arrow-js
> >
> > I don't think this matters, nor will anyone actually care in practice.
> > If someone google searches for "Arrow Rust" or "Arrow JavaScript",
> > they'll find what they need.
> >
> > > - decrease project visibility: having all of the implementations in a
> > > single repository makes other implementations trivial to find
> >
> > As long as things are documented clearly in READMEs and the READMEs
> > have links to each other, I think this is okay.
> >
> > > - introduces a lot of complexity for the CI/CD processes, and not just
> > > for the rust folks
> >
> > Could you explain this? In apache/arrow, we're basically just deleting
> > stuff from our CI/CD configurations. In integration testing, we would
> > have an additional git checkout phase to add a pinned version of Rust
> > to the integration tests.
> We still need to pay additional maintenance cost to:
> - keep the version pins up to date
> - periodically refresh the rust build scripts as the project evolve
> - integration test should be exercised on the rust side as well either
> by using archery from apache/arrow or by custom scripting
>
> Sharing common tooling will be harder between separate repositories,
> integrating with 3rdparty services (like a new CI) will involve more
> round trips with the INFRA team.

I think what the Rust folks are saying is that they are willing to pay
these costs in exchange for the alternative development structure. So
the "we" here doesn't mean that it's going to create more work for you
(Krisztian) specifically.

For those of us not principally concerned with Rust, I think we should
leave it to the Rust-centric folks to take care of these things, and
if anything falls into disrepair in such a way that it affects
non-Rust parts of the project (for example, our CI), then we can
simply disable things until it can be taken care of.

> > > - will make harder to interface/link between different implementations
> > >
> > > This seems unreasonable to me.
> >
> > It seems we won't be able to satisfy every requirement. Creating
> > interdependent projects will have more work, but we will have to
> > create tools to facilitate this if/when it becomes a concern.
> >
> > >
> > > # Action 2: Use github for issue tracking
> > >
> > > Pros:
> > > - easier for new contributors
> > > - more flexible in certain ways
> > > Cons:
> > > - not the apache way of issue tracking
> >
> > This isn't true — several other Apache projects use GitHub issues.
> >
> > > - doubtful outcome for large number of issues
> >
> > This isn't each of our problem to solve — if the Rust projects become
> > disorganized in their issues, we can bring it up on the mailing list
> > and discuss remedies in the future.
> >
> > >
> > > I don't like either JIRA but I can live with it, though I understand
> > > the frustration around it.
> > > Since GH issues vs. JIRA seems like a hot topic lately we could try to
> > > experiment with a less radical change: enable github issues for the
> > > whole project and sync them to JIRA (either by using an existing
> > > service or by developing a github action for it). We may end up
> > > preferring github issues eventually.
> >
> > This seems like a can of worms. I think that there is a cultural
> > expectation in the Rust community for individual crates to have their
> > own respective GitHub issues. So this change is allowing for that.
> >
> > I don't see a need to change our issue management in the rest of the
> > project. The C++ project and its dependents behave increasingly like
> > an "enterprise" project in its development culture where the more
> > structured Jira approach is a good fit.
> >
> > >
> > > All in all, I find this proposal way too invasive. It sounds more like
> > > starting a new project with its own governance rather than making
> > > releases more accessible to users.
> >
> > I'm taking a laissez-faire attitude here — if the Rust developers want
> > to implement this change, I'm happy for them go ahead and do it. Since
> > it is almost strictly subtractive to apache/arrow, it should not
> > create extra burdens for non-Rust developers.
> >
> > In general, the nature of the conflict that we've been having is one
> > programming language forcing conformity on another. Just as we're
> > saying to let Rust adopt its cultural norms, there shouldn't be an
> > expectation that other parts of Apache Arrow should be conforming to
> > things from the Rust ecosystem.
> >
> > Regarding governance: there are no governance changes.
> >
> > * Committers and PMC members still have to be approved in the same way
> > * The Arrow PMC will vote on releases
> >
> > In Parquet, for example we have the parquet-mr and parquet-format
> > repositories, which release separately but share a common
> > committership and PMC.
> >
> > >
> > > Thanks, Krisztian
> > >
> > >
> > > On Fri, Apr 9, 2021 at 5:18 PM Andy Grove <andygrov...@gmail.com> wrote:
> > > >
> > > > Following on from the email thread "Rust sync meeting" I would like to
> > > > start a new discussion about moving the Rust components out to new 
> > > > GitHub
> > > > repositories and using a new process for issues and release management.
> > > >
> > > > I have started a Google document [1] with details and to track the work
> > > > required for this effort but I will summarize the key points of the
> > > > proposal here:
> > > >
> > > >
> > > >    -
> > > >
> > > >    Move existing Rust code into two new repositories
> > > >    -
> > > >
> > > >       apache/arrow-rs
> > > >       -
> > > >
> > > >          Arrow + Parquet crates
> > > >          -
> > > >
> > > >       apache/datafusion
> > > >       -
> > > >
> > > >          DataFusion + Ballista crates (which are expected to merge to 
> > > > some
> > > >          degree over time)
> > > >          -
> > > >
> > > >          TPC-H benchmarks
> > > >          -
> > > >
> > > >       Use GitHub issues for issue tracking
> > > >       -
> > > >
> > > >    Decouple release process
> > > >    -
> > > >
> > > >       Crates are released individually
> > > >       -
> > > >
> > > >       A vote on the source release of the released crate is held over 
> > > > the
> > > >       mailing list as usual.
> > > >       -
> > > >
> > > >       Rust does not need to release a new version when the rest of Arrow
> > > >       releases; we bundle our latest released crates to the signed tar.
> > > >       -
> > > >
> > > >       Crates can depend on GitHub commit hashes between releases
> > > >
> > > >
> > > > The Google document may be the best place to collaborate on the proposal
> > > > but I can update the document based on any comments in this email 
> > > > thread as
> > > > well.
> > > >
> > > > Note that I have excluded discussion about arrow2/parquet2 from this
> > > > proposal and I believe we should discuss that separately as a follow-on
> > > > discussion.
> > > >
> > > > I look forward to hearing opinions on this both from current Rust
> > > > maintainers and contributors and also from the wider Arrow community.
> > > >
> > > > Thanks,
> > > >
> > > > Andy.
> > > >
> > > > [1]
> > > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing

Reply via email to