Re: Julia implementation and integration with main apache arrow repository

2020-10-12 Thread Neal Richardson
I'll start a vote for accepting this code donation (conditional on IP
clearance).

Neal

On Mon, Sep 14, 2020 at 5:07 PM Kenta Murata  wrote:

> Hi Jacob,
>
> I'm very excited to see Julia's implementation of Arrow is restarted.
>
> Pkg.jl seems now support treating packages in subdirectories.
> I guess the feature is added by
> https://github.com/JuliaLang/Pkg.jl/pull/1766 and
> https://github.com/JuliaRegistries/RegistryTools.jl/pull/31.
> As these pull request, you can tell the location of Julia package
> directory to Pkg.jl
> by `subdir` parameter in Project.toml.
>
> 2020年9月14日(月) 4:33 Jacob Quinn :
> >
> > Hello all,
> >
> > Hopefully this email works (I'm not super familiar with using mailing
> lists
> > like this).
> >
> > Over the past few weeks, I've been working on a pure Julia implementation
> > to support serializing/deserializing the arrow format for Julia. The code
> > in its current state can be found here:
> > https://github.com/JuliaData/Arrow.jl.
> >
> > I believe the code has reached an initial beta-level quality and just
> > finished writing the arrow <-> json integration testing code that archery
> > expects. I haven't worked on actual archery integration yet, but it
> should
> > just be a matter of adding a tester_julia.py file that knows how to
> invoke
> > the test/integrationtest.jl file with similar arguments as the
> tester_go.py
> > file.
> >
> > This email has a couple purposes:
> > * Signal that the julia code is somewhat ready to be used/integrated in
> the
> > main repo
> > * Ask for advice/direction on actually integrating with the apache arrow
> > github repository
> >
> > For the latter, in particular, I imagine keeping an initial PR as minimal
> > as possible is desirable. I need to follow up with the core pkg devs for
> > Julia, but I've been told it's possible/not hard to have a Julia package
> > "live" inside a monorepo, but I just haven't figured out the details of
> > what that means on the Julia General package registry side of things. But
> > I'm happy to figure that out and shouldn't really affect the merging of
> > Julia code into the apache arrow github.
> >
> > So my plan is roughly:
> > * Fork/make a branch of the apache arrow repo
> > * Add in the Julia code from the link I mentioned above
> > * Add necessary files/integration in archery to run Julia integration
> tests
> > alongside other languages
> > * Do initial merge into apache arrow?
> >
> > If there are other initial requirements core devs would expect, just let
> me
> > know, but I imagine that updating the implementation matrix, for example,
> > can be done afterwards as follow up.
> >
> > Excited to have Julia more officially integrated here!
> >
> > Cheers,
> >
> > -Jacob
> > https://github.com/quinnj
> > https://twitter.com/quinn_jacobd
>
>
>
> --
> Regards,
> Kenta Murata
>


Re: Julia implementation and integration with main apache arrow repository

2020-09-14 Thread Kenta Murata
Hi Jacob,

I'm very excited to see Julia's implementation of Arrow is restarted.

Pkg.jl seems now support treating packages in subdirectories.
I guess the feature is added by
https://github.com/JuliaLang/Pkg.jl/pull/1766 and
https://github.com/JuliaRegistries/RegistryTools.jl/pull/31.
As these pull request, you can tell the location of Julia package
directory to Pkg.jl
by `subdir` parameter in Project.toml.

2020年9月14日(月) 4:33 Jacob Quinn :
>
> Hello all,
>
> Hopefully this email works (I'm not super familiar with using mailing lists
> like this).
>
> Over the past few weeks, I've been working on a pure Julia implementation
> to support serializing/deserializing the arrow format for Julia. The code
> in its current state can be found here:
> https://github.com/JuliaData/Arrow.jl.
>
> I believe the code has reached an initial beta-level quality and just
> finished writing the arrow <-> json integration testing code that archery
> expects. I haven't worked on actual archery integration yet, but it should
> just be a matter of adding a tester_julia.py file that knows how to invoke
> the test/integrationtest.jl file with similar arguments as the tester_go.py
> file.
>
> This email has a couple purposes:
> * Signal that the julia code is somewhat ready to be used/integrated in the
> main repo
> * Ask for advice/direction on actually integrating with the apache arrow
> github repository
>
> For the latter, in particular, I imagine keeping an initial PR as minimal
> as possible is desirable. I need to follow up with the core pkg devs for
> Julia, but I've been told it's possible/not hard to have a Julia package
> "live" inside a monorepo, but I just haven't figured out the details of
> what that means on the Julia General package registry side of things. But
> I'm happy to figure that out and shouldn't really affect the merging of
> Julia code into the apache arrow github.
>
> So my plan is roughly:
> * Fork/make a branch of the apache arrow repo
> * Add in the Julia code from the link I mentioned above
> * Add necessary files/integration in archery to run Julia integration tests
> alongside other languages
> * Do initial merge into apache arrow?
>
> If there are other initial requirements core devs would expect, just let me
> know, but I imagine that updating the implementation matrix, for example,
> can be done afterwards as follow up.
>
> Excited to have Julia more officially integrated here!
>
> Cheers,
>
> -Jacob
> https://github.com/quinnj
> https://twitter.com/quinn_jacobd



-- 
Regards,
Kenta Murata


Re: Julia implementation and integration with main apache arrow repository

2020-09-13 Thread Wes McKinney
Would be great to see Julia working together with the broader Apache
Arrow community. I attempted unsuccessfully in the past several times
to engage with other developers.

Since this code was developed outside of the community, we need to
decide whether the full IP clearance process is needed

https://incubator.apache.org/ip-clearance/

We've done this 9 other times since the project started, but in
several cases the code in question had been in active development
outside the community for longer period of time.

Thanks,
Wes

On Sun, Sep 13, 2020 at 3:40 PM Neal Richardson
 wrote:
>
> Hi Jacob,
> Yes, this is exciting!
>
> My recommendation would be to fork apache/arrow, add a `julia` directory,
> copy the contents of https://github.com/JuliaData/Arrow.jl in there, and
> put up a pull request for review. Then we can discuss specifics there.
> There will be a few other steps to do that I can think of:
>
> * Drop the MIT license file since the code will have to be under the
> Apache-2 license. I believe there will have to be some sort of IP-related
> declarations to be made in order for the Arrow project to accept the code
> donation; I'll let others who've gone through that chime in with
> recommendations there.
> * Every file will need a license note at the top as well; see examples of
> that throughout the arrow repository. If there are generated files or files
> that for whatever reason can't have the license header, add them to the
> list in `dev/release/rat_exclude_files.txt`.
> * You may want to add a GitHub Actions job for the unit tests, or that can
> be done in a followup.
>
> I'd recommend setting up the integration tests in a followup, personally,
> but others may disagree. Not all implementations in the project have
> integration tests at the moment, so while it is very valuable and strongly
> encouraged, it's not a blocker.
>
> Neal
>
> On Sun, Sep 13, 2020 at 12:33 PM Jacob Quinn  wrote:
>
> > Hello all,
> >
> > Hopefully this email works (I'm not super familiar with using mailing lists
> > like this).
> >
> > Over the past few weeks, I've been working on a pure Julia implementation
> > to support serializing/deserializing the arrow format for Julia. The code
> > in its current state can be found here:
> > https://github.com/JuliaData/Arrow.jl.
> >
> > I believe the code has reached an initial beta-level quality and just
> > finished writing the arrow <-> json integration testing code that archery
> > expects. I haven't worked on actual archery integration yet, but it should
> > just be a matter of adding a tester_julia.py file that knows how to invoke
> > the test/integrationtest.jl file with similar arguments as the tester_go.py
> > file.
> >
> > This email has a couple purposes:
> > * Signal that the julia code is somewhat ready to be used/integrated in the
> > main repo
> > * Ask for advice/direction on actually integrating with the apache arrow
> > github repository
> >
> > For the latter, in particular, I imagine keeping an initial PR as minimal
> > as possible is desirable. I need to follow up with the core pkg devs for
> > Julia, but I've been told it's possible/not hard to have a Julia package
> > "live" inside a monorepo, but I just haven't figured out the details of
> > what that means on the Julia General package registry side of things. But
> > I'm happy to figure that out and shouldn't really affect the merging of
> > Julia code into the apache arrow github.
> >
> > So my plan is roughly:
> > * Fork/make a branch of the apache arrow repo
> > * Add in the Julia code from the link I mentioned above
> > * Add necessary files/integration in archery to run Julia integration tests
> > alongside other languages
> > * Do initial merge into apache arrow?
> >
> > If there are other initial requirements core devs would expect, just let me
> > know, but I imagine that updating the implementation matrix, for example,
> > can be done afterwards as follow up.
> >
> > Excited to have Julia more officially integrated here!
> >
> > Cheers,
> >
> > -Jacob
> > https://github.com/quinnj
> > https://twitter.com/quinn_jacobd
> >


Re: Julia implementation and integration with main apache arrow repository

2020-09-13 Thread Neal Richardson
Hi Jacob,
Yes, this is exciting!

My recommendation would be to fork apache/arrow, add a `julia` directory,
copy the contents of https://github.com/JuliaData/Arrow.jl in there, and
put up a pull request for review. Then we can discuss specifics there.
There will be a few other steps to do that I can think of:

* Drop the MIT license file since the code will have to be under the
Apache-2 license. I believe there will have to be some sort of IP-related
declarations to be made in order for the Arrow project to accept the code
donation; I'll let others who've gone through that chime in with
recommendations there.
* Every file will need a license note at the top as well; see examples of
that throughout the arrow repository. If there are generated files or files
that for whatever reason can't have the license header, add them to the
list in `dev/release/rat_exclude_files.txt`.
* You may want to add a GitHub Actions job for the unit tests, or that can
be done in a followup.

I'd recommend setting up the integration tests in a followup, personally,
but others may disagree. Not all implementations in the project have
integration tests at the moment, so while it is very valuable and strongly
encouraged, it's not a blocker.

Neal

On Sun, Sep 13, 2020 at 12:33 PM Jacob Quinn  wrote:

> Hello all,
>
> Hopefully this email works (I'm not super familiar with using mailing lists
> like this).
>
> Over the past few weeks, I've been working on a pure Julia implementation
> to support serializing/deserializing the arrow format for Julia. The code
> in its current state can be found here:
> https://github.com/JuliaData/Arrow.jl.
>
> I believe the code has reached an initial beta-level quality and just
> finished writing the arrow <-> json integration testing code that archery
> expects. I haven't worked on actual archery integration yet, but it should
> just be a matter of adding a tester_julia.py file that knows how to invoke
> the test/integrationtest.jl file with similar arguments as the tester_go.py
> file.
>
> This email has a couple purposes:
> * Signal that the julia code is somewhat ready to be used/integrated in the
> main repo
> * Ask for advice/direction on actually integrating with the apache arrow
> github repository
>
> For the latter, in particular, I imagine keeping an initial PR as minimal
> as possible is desirable. I need to follow up with the core pkg devs for
> Julia, but I've been told it's possible/not hard to have a Julia package
> "live" inside a monorepo, but I just haven't figured out the details of
> what that means on the Julia General package registry side of things. But
> I'm happy to figure that out and shouldn't really affect the merging of
> Julia code into the apache arrow github.
>
> So my plan is roughly:
> * Fork/make a branch of the apache arrow repo
> * Add in the Julia code from the link I mentioned above
> * Add necessary files/integration in archery to run Julia integration tests
> alongside other languages
> * Do initial merge into apache arrow?
>
> If there are other initial requirements core devs would expect, just let me
> know, but I imagine that updating the implementation matrix, for example,
> can be done afterwards as follow up.
>
> Excited to have Julia more officially integrated here!
>
> Cheers,
>
> -Jacob
> https://github.com/quinnj
> https://twitter.com/quinn_jacobd
>


Julia implementation and integration with main apache arrow repository

2020-09-13 Thread Jacob Quinn
Hello all,

Hopefully this email works (I'm not super familiar with using mailing lists
like this).

Over the past few weeks, I've been working on a pure Julia implementation
to support serializing/deserializing the arrow format for Julia. The code
in its current state can be found here:
https://github.com/JuliaData/Arrow.jl.

I believe the code has reached an initial beta-level quality and just
finished writing the arrow <-> json integration testing code that archery
expects. I haven't worked on actual archery integration yet, but it should
just be a matter of adding a tester_julia.py file that knows how to invoke
the test/integrationtest.jl file with similar arguments as the tester_go.py
file.

This email has a couple purposes:
* Signal that the julia code is somewhat ready to be used/integrated in the
main repo
* Ask for advice/direction on actually integrating with the apache arrow
github repository

For the latter, in particular, I imagine keeping an initial PR as minimal
as possible is desirable. I need to follow up with the core pkg devs for
Julia, but I've been told it's possible/not hard to have a Julia package
"live" inside a monorepo, but I just haven't figured out the details of
what that means on the Julia General package registry side of things. But
I'm happy to figure that out and shouldn't really affect the merging of
Julia code into the apache arrow github.

So my plan is roughly:
* Fork/make a branch of the apache arrow repo
* Add in the Julia code from the link I mentioned above
* Add necessary files/integration in archery to run Julia integration tests
alongside other languages
* Do initial merge into apache arrow?

If there are other initial requirements core devs would expect, just let me
know, but I imagine that updating the implementation matrix, for example,
can be done afterwards as follow up.

Excited to have Julia more officially integrated here!

Cheers,

-Jacob
https://github.com/quinnj
https://twitter.com/quinn_jacobd