On Mon, Apr 12, 2021 at 4:44 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Krisztian, > > On Mon, Apr 12, 2021 at 8:41 AM Krisztián Szűcs > <szucs.kriszt...@gmail.com> wrote: > > > > Hi, > > > > Based on the google document I see one actual problem and two actions > > not explicitly solving the real issue. > > > > # Issue: Decouple release process to enable independent releases > > > > This is something the whole project requires, not just the rust > > implementation. Eventually every implementation should have its own > > release cycle not blocked on others. > > Even if we want to tackle this just for the rust implementation in the > > first iteration, we need to figure out the right versioning scheme, > > voting process and integration testing. Note that we have already > > decided to decouple the source release process from the release of the > > binaries. We also need to figure out how to handle the source release > > itself once we provide different artifacts for different > > implementations. > > I don't think it's necessary to figure out all this right now. For > now, this seems to be a Rust-specific issue. > > > # Action 1: Maintain the rust implementation in additional apache > > repositories > > > > Pros: > > - presumably would make it easier to define the dependencies between > > the rust crates (though cargo should support multiple crates in a > > single repository) > > Cons: > > - inconsistent: if there is apache/arrow-rs why isn't there apache/arrow-js > > I don't think this matters, nor will anyone actually care in practice. > If someone google searches for "Arrow Rust" or "Arrow JavaScript", > they'll find what they need. > > > - decrease project visibility: having all of the implementations in a > > single repository makes other implementations trivial to find > > As long as things are documented clearly in READMEs and the READMEs > have links to each other, I think this is okay. > > > - introduces a lot of complexity for the CI/CD processes, and not just > > for the rust folks > > Could you explain this? In apache/arrow, we're basically just deleting > stuff from our CI/CD configurations. In integration testing, we would > have an additional git checkout phase to add a pinned version of Rust > to the integration tests. We still need to pay additional maintenance cost to: - keep the version pins up to date - periodically refresh the rust build scripts as the project evolve - integration test should be exercised on the rust side as well either by using archery from apache/arrow or by custom scripting
Sharing common tooling will be harder between separate repositories, integrating with 3rdparty services (like a new CI) will involve more round trips with the INFRA team. > > - will make harder to interface/link between different implementations > > > > This seems unreasonable to me. > > It seems we won't be able to satisfy every requirement. Creating > interdependent projects will have more work, but we will have to > create tools to facilitate this if/when it becomes a concern. > > > > > # Action 2: Use github for issue tracking > > > > Pros: > > - easier for new contributors > > - more flexible in certain ways > > Cons: > > - not the apache way of issue tracking > > This isn't true — several other Apache projects use GitHub issues. > > > - doubtful outcome for large number of issues > > This isn't each of our problem to solve — if the Rust projects become > disorganized in their issues, we can bring it up on the mailing list > and discuss remedies in the future. > > > > > I don't like either JIRA but I can live with it, though I understand > > the frustration around it. > > Since GH issues vs. JIRA seems like a hot topic lately we could try to > > experiment with a less radical change: enable github issues for the > > whole project and sync them to JIRA (either by using an existing > > service or by developing a github action for it). We may end up > > preferring github issues eventually. > > This seems like a can of worms. I think that there is a cultural > expectation in the Rust community for individual crates to have their > own respective GitHub issues. So this change is allowing for that. > > I don't see a need to change our issue management in the rest of the > project. The C++ project and its dependents behave increasingly like > an "enterprise" project in its development culture where the more > structured Jira approach is a good fit. > > > > > All in all, I find this proposal way too invasive. It sounds more like > > starting a new project with its own governance rather than making > > releases more accessible to users. > > I'm taking a laissez-faire attitude here — if the Rust developers want > to implement this change, I'm happy for them go ahead and do it. Since > it is almost strictly subtractive to apache/arrow, it should not > create extra burdens for non-Rust developers. > > In general, the nature of the conflict that we've been having is one > programming language forcing conformity on another. Just as we're > saying to let Rust adopt its cultural norms, there shouldn't be an > expectation that other parts of Apache Arrow should be conforming to > things from the Rust ecosystem. > > Regarding governance: there are no governance changes. > > * Committers and PMC members still have to be approved in the same way > * The Arrow PMC will vote on releases > > In Parquet, for example we have the parquet-mr and parquet-format > repositories, which release separately but share a common > committership and PMC. > > > > > Thanks, Krisztian > > > > > > On Fri, Apr 9, 2021 at 5:18 PM Andy Grove <andygrov...@gmail.com> wrote: > > > > > > Following on from the email thread "Rust sync meeting" I would like to > > > start a new discussion about moving the Rust components out to new GitHub > > > repositories and using a new process for issues and release management. > > > > > > I have started a Google document [1] with details and to track the work > > > required for this effort but I will summarize the key points of the > > > proposal here: > > > > > > > > > - > > > > > > Move existing Rust code into two new repositories > > > - > > > > > > apache/arrow-rs > > > - > > > > > > Arrow + Parquet crates > > > - > > > > > > apache/datafusion > > > - > > > > > > DataFusion + Ballista crates (which are expected to merge to some > > > degree over time) > > > - > > > > > > TPC-H benchmarks > > > - > > > > > > Use GitHub issues for issue tracking > > > - > > > > > > Decouple release process > > > - > > > > > > Crates are released individually > > > - > > > > > > A vote on the source release of the released crate is held over the > > > mailing list as usual. > > > - > > > > > > Rust does not need to release a new version when the rest of Arrow > > > releases; we bundle our latest released crates to the signed tar. > > > - > > > > > > Crates can depend on GitHub commit hashes between releases > > > > > > > > > The Google document may be the best place to collaborate on the proposal > > > but I can update the document based on any comments in this email thread > > > as > > > well. > > > > > > Note that I have excluded discussion about arrow2/parquet2 from this > > > proposal and I believe we should discuss that separately as a follow-on > > > discussion. > > > > > > I look forward to hearing opinions on this both from current Rust > > > maintainers and contributors and also from the wider Arrow community. > > > > > > Thanks, > > > > > > Andy. > > > > > > [1] > > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing