Re: Julia implementation and integration with main apache arrow repository

Neal Richardson Sun, 13 Sep 2020 13:41:09 -0700

Hi Jacob,
Yes, this is exciting!

My recommendation would be to fork apache/arrow, add a `julia` directory,
copy the contents of https://github.com/JuliaData/Arrow.jl in there, and
put up a pull request for review. Then we can discuss specifics there.
There will be a few other steps to do that I can think of:

* Drop the MIT license file since the code will have to be under the
Apache-2 license. I believe there will have to be some sort of IP-related
declarations to be made in order for the Arrow project to accept the code
donation; I'll let others who've gone through that chime in with
recommendations there.
* Every file will need a license note at the top as well; see examples of
that throughout the arrow repository. If there are generated files or files
that for whatever reason can't have the license header, add them to the
list in `dev/release/rat_exclude_files.txt`.
* You may want to add a GitHub Actions job for the unit tests, or that can
be done in a followup.

I'd recommend setting up the integration tests in a followup, personally,
but others may disagree. Not all implementations in the project have
integration tests at the moment, so while it is very valuable and strongly
encouraged, it's not a blocker.

Neal

On Sun, Sep 13, 2020 at 12:33 PM Jacob Quinn <quinn.jac...@gmail.com> wrote:

> Hello all,
>
> Hopefully this email works (I'm not super familiar with using mailing lists
> like this).
>
> Over the past few weeks, I've been working on a pure Julia implementation
> to support serializing/deserializing the arrow format for Julia. The code
> in its current state can be found here:
> https://github.com/JuliaData/Arrow.jl.
>
> I believe the code has reached an initial beta-level quality and just
> finished writing the arrow <-> json integration testing code that archery
> expects. I haven't worked on actual archery integration yet, but it should
> just be a matter of adding a tester_julia.py file that knows how to invoke
> the test/integrationtest.jl file with similar arguments as the tester_go.py
> file.
>
> This email has a couple purposes:
> * Signal that the julia code is somewhat ready to be used/integrated in the
> main repo
> * Ask for advice/direction on actually integrating with the apache arrow
> github repository
>
> For the latter, in particular, I imagine keeping an initial PR as minimal
> as possible is desirable. I need to follow up with the core pkg devs for
> Julia, but I've been told it's possible/not hard to have a Julia package
> "live" inside a monorepo, but I just haven't figured out the details of
> what that means on the Julia General package registry side of things. But
> I'm happy to figure that out and shouldn't really affect the merging of
> Julia code into the apache arrow github.
>
> So my plan is roughly:
> * Fork/make a branch of the apache arrow repo
> * Add in the Julia code from the link I mentioned above
> * Add necessary files/integration in archery to run Julia integration tests
> alongside other languages
> * Do initial merge into apache arrow?
>
> If there are other initial requirements core devs would expect, just let me
> know, but I imagine that updating the implementation matrix, for example,
> can be done afterwards as follow up.
>
> Excited to have Julia more officially integrated here!
>
> Cheers,
>
> -Jacob
> https://github.com/quinnj
> https://twitter.com/quinn_jacobd
>

Re: Julia implementation and integration with main apache arrow repository

Reply via email to