I can comment as the primary apache arrow liaison for the Arrow.jl repository and original code donator.
I apologize for the "surprise", but I commented a few times in various places and put a snippet in the README <https://github.com/apache/arrow/tree/master/julia/Arrow#difference-between-this-code-and-the-juliadataarrowjl-repository> about the approach I wanted to take w/ the Julia implementation in terms of keeping the JuliaData/Arrow.jl repository as a "dev branch" of sorts of the apache/arrow code, upstreaming changes periodically. There's even a script <https://github.com/JuliaData/Arrow.jl/blob/main/scripts/update_apache_arrow_code.jl> I wrote to mostly automate this upstreaming. I realize now that I didn't consider the "Arrow PMC" position on this kind of setup or seek to affirm that it would be ok to approach things like this. The reality is that Julia users are very engrained to expect Julia packages to live in a single stand-alone github repo, where issues can be opened, and pull requests are welcome. It was hard and still is hard to imagine "turning that off", since I believe we would lose a lot of valuable bug reports and first-time contributions. This isn't necessarily any fault of how the bug report/contribution process is handled for the arrow project overall, though I'm also aware that there's a desire to make it easier <https://lists.apache.org/x/thread.html/r8817dfba08ef8daa210956db69d513fd27b7a751d28fb8f27e39cc7e@%3Cdev.arrow.apache.org%3E> and it currently requires more and different effort than Julia users are used to. I think it's more from how open, welcoming, and how strong the culture is in Julia around encouraging community contributions and the tight integration with github and its open-source project management tools. Additionally, I was and still am concerned about the overall release process of the apache/arrow project. I know there have been efforts there as well to make it easier for individual languages to release on their own cadence, but just anecdotally, the JuliaData/Arrow.jl has had/needed/wanted 10 patch and minor releases since the original code donation, whereas the apache/arrow project has had one (3.0.0). This leads to some of the concerns I have with restricting development to just the apache/arrow repository: how exactly does the release process work for individual languages who may desire independent releases apart from the quarterly overall project releases? I think from the Rust thread I remember that you just need a group of language contributors to all agree, but what if I'm the only "active" Julia contributor? It's also unclear what the expectations are for actual development: with the original code donation PRs, I know Neal "reviewed" the PRs, but perhaps missed the details around how I proposed development continue going forward. Is it required to have a certain number of reviews before merging? On the Julia side, I can try to encourage/push for those who have contributed to the JuliaData/Arrow.jl repository to help review PRs to apache/arrow, but I also can't guarantee we would always have someone to review. It just feels pretty awkward if I keep needing to ping non-Julia people to "review" a PR to merge it. Perhaps this is just a problem of the overall Julia implementation "smallness" in terms of contributors, but I'm not sure on the best answer here. So in short, I'm not sure on the best path forward. I think strictly restricting development to the apache/arrow physical repository would actively hurt the progress of the Julia implementation, whereas it *has* been progressing with increasing momentum since first released. There are posts on the Julia discourse forum, in the Julia slack and zulip communities, and quite a few issues/PRs being opened at the JuliaData/Arrow.jl repository. There have been several calls for arrow flight support, with a member from Julia Computing actually close to releasing a gRPC client <https://github.com/JuliaComputing/gRPCClient.jl> specifically to help with flight support. But in terms of actual committers, it's been primarily just myself, with a few minor contributions by others. I guess the big question that comes to mind is what are the hard requirements to be considered an "official implementation"? Does the code *have* to live in the same physical repo? Or if it passed the series of archery integration tests, would that be enough? I apologize for my naivete/inexperience on all things "apache", but I imagine that's a big part of it: having official development/releases through the apache/arrow community, though again I'm not exactly sure on the formal processes here? I would like to keep Julia as an official implementation, but I'm also mostly carrying the maintainership alone at the moment and want to be realistic with the future of the project. I'm open to discussion and ideas on the best way forward. -Jacob On Tue, Mar 30, 2021 at 2:03 PM Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > I was very surprised today to learn that the Julia Arrow > implementation has continued operating more or less like an > independent open source project since the code donation last November: > > https://github.com/JuliaData/Arrow.jl/commits/main > > There may have been a misunderstanding about what was expected to > occur after the code donation, but it's problematic for a bunch of > reasons (IP lineage / governance / community development) to have work > happening on the implementation "outside the community". > > In any case, what is done is done, so the Arrow PMC's position on this > would be roughly to regard the work as a hard fork of what's in Apache > Arrow, which given its development activity is more or less inactive > [1]. (I had actually thought the project was simply inactive after the > code donation) > > The critical question now is, is there interest from Julia developers > in working "in the community", which is to say: > > * Having development discussions on ASF channels (mailing list, > GitHub, JIRA), planning and communicating in the open > * Doing all development in ASF GitHub repositories > > The answer to the question may be "no" (which is okay), but if that's > the case, I don't think we should be giving the impression that we > have an official Julia implementation that is developed and maintained > by the community (and so my argument would be unfortunately to drop > the donated code from the project). > > If the answer is "yes", there needs to be a hard commitment to move > development to Apache channels and not look back. We would also need > to figure out what to do to document and synchronize the new IP that's > been created since the code donation. > > Thanks, > Wes > > [1]: https://github.com/apache/arrow/commits/master/julia/Arrow >