I can comment as the primary apache arrow liaison for the Arrow.jl
repository and original code donator.

I apologize for the "surprise", but I commented a few times in various
places and put a snippet in the README
<https://github.com/apache/arrow/tree/master/julia/Arrow#difference-between-this-code-and-the-juliadataarrowjl-repository>
about
the approach I wanted to take w/ the Julia implementation in terms of
keeping the JuliaData/Arrow.jl repository as a "dev branch" of sorts of the
apache/arrow code, upstreaming changes periodically. There's even a script
<https://github.com/JuliaData/Arrow.jl/blob/main/scripts/update_apache_arrow_code.jl>
I wrote to mostly automate this upstreaming. I realize now that I didn't
consider the "Arrow PMC" position on this kind of setup or seek to affirm
that it would be ok to approach things like this.

The reality is that Julia users are very engrained to expect Julia packages
to live in a single stand-alone github repo, where issues can be opened,
and pull requests are welcome. It was hard and still is hard to imagine
"turning that off", since I believe we would lose a lot of valuable bug
reports and first-time contributions. This isn't necessarily any fault of
how the bug report/contribution process is handled for the arrow project
overall, though I'm also aware that there's a desire to make it easier
<https://lists.apache.org/x/thread.html/r8817dfba08ef8daa210956db69d513fd27b7a751d28fb8f27e39cc7e@%3Cdev.arrow.apache.org%3E>
and
it currently requires more and different effort than Julia users are used
to. I think it's more from how open, welcoming, and how strong the culture
is in Julia around encouraging community contributions and the tight
integration with github and its open-source project management tools.

Additionally, I was and still am concerned about the overall release
process of the apache/arrow project. I know there have been efforts there
as well to make it easier for individual languages to release on their own
cadence, but just anecdotally, the JuliaData/Arrow.jl has had/needed/wanted
10 patch and minor releases since the original code donation, whereas the
apache/arrow project has had one (3.0.0). This leads to some of the
concerns I have with restricting development to just the apache/arrow
repository: how exactly does the release process work for individual
languages who may desire independent releases apart from the quarterly
overall project releases? I think from the Rust thread I remember that you
just need a group of language contributors to all agree, but what if I'm
the only "active" Julia contributor? It's also unclear what the
expectations are for actual development: with the original code donation
PRs, I know Neal "reviewed" the PRs, but perhaps missed the details around
how I proposed development continue going forward. Is it required to have a
certain number of reviews before merging? On the Julia side, I can try to
encourage/push for those who have contributed to the JuliaData/Arrow.jl
repository to help review PRs to apache/arrow, but I also can't guarantee
we would always have someone to review. It just feels pretty awkward if I
keep needing to ping non-Julia people to "review" a PR to merge it. Perhaps
this is just a problem of the overall Julia implementation "smallness" in
terms of contributors, but I'm not sure on the best answer here.

So in short, I'm not sure on the best path forward. I think strictly
restricting development to the apache/arrow physical repository would
actively hurt the progress of the Julia implementation, whereas it *has*
been progressing with increasing momentum since first released. There are
posts on the Julia discourse forum, in the Julia slack and zulip
communities, and quite a few issues/PRs being opened at the
JuliaData/Arrow.jl repository. There have been several calls for arrow
flight support, with a member from Julia Computing actually close to
releasing a gRPC client
<https://github.com/JuliaComputing/gRPCClient.jl> specifically
to help with flight support. But in terms of actual committers, it's been
primarily just myself, with a few minor contributions by others.

I guess the big question that comes to mind is what are the hard
requirements to be considered an "official implementation"? Does the code
*have* to live in the same physical repo? Or if it passed the series of
archery integration tests, would that be enough? I apologize for my
naivete/inexperience on all things "apache", but I imagine that's a big
part of it: having official development/releases through the apache/arrow
community, though again I'm not exactly sure on the formal processes here?
I would like to keep Julia as an official implementation, but I'm also
mostly carrying the maintainership alone at the moment and want to be
realistic with the future of the project.

I'm open to discussion and ideas on the best way forward.

-Jacob

On Tue, Mar 30, 2021 at 2:03 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi folks,
>
> I was very surprised today to learn that the Julia Arrow
> implementation has continued operating more or less like an
> independent open source project since the code donation last November:
>
> https://github.com/JuliaData/Arrow.jl/commits/main
>
> There may have been a misunderstanding about what was expected to
> occur after the code donation, but it's problematic for a bunch of
> reasons (IP lineage / governance / community development) to have work
> happening on the implementation "outside the community".
>
> In any case, what is done is done, so the Arrow PMC's position on this
> would be roughly to regard the work as a hard fork of what's in Apache
> Arrow, which given its development activity is more or less inactive
> [1]. (I had actually thought the project was simply inactive after the
> code donation)
>
> The critical question now is, is there interest from Julia developers
> in working "in the community", which is to say:
>
> * Having development discussions on ASF channels (mailing list,
> GitHub, JIRA), planning and communicating in the open
> * Doing all development in ASF GitHub repositories
>
> The answer to the question may be "no" (which is okay), but if that's
> the case, I don't think we should be giving the impression that we
> have an official Julia implementation that is developed and maintained
> by the community (and so my argument would be unfortunately to drop
> the donated code from the project).
>
> If the answer is "yes", there needs to be a hard commitment to move
> development to Apache channels and not look back. We would also need
> to figure out what to do to document and synchronize the new IP that's
> been created since the code donation.
>
> Thanks,
> Wes
>
> [1]: https://github.com/apache/arrow/commits/master/julia/Arrow
>

Reply via email to