I would love to see a component or some standardization around Java using the C 
Data Interface. I’ve been prototyping JNI bindings for DataFusion in the last 
week or so with some success, and was getting ready to ask where/how such a 
thing might fit in. I’ll be sure to watch that JIRA. 

(I also prototyped with a Panama EA build, but obviously we’re a long way from 
using that)

Paul

> On Aug 4, 2021, at 11:11 AM, Antoine Pitrou <anto...@python.org> wrote:
> 
> 
> I don't know about the rest of these tasks, but sharing data between Arrow 
> Java and C++ should definitely use the C data interface.
> 
> It seems there's work in progress here, feel free to collaborate:
> https://issues.apache.org/jira/browse/ARROW-12965
> 
> Regards
> 
> Antoine.
> 
> 
>> Le 04/08/2021 à 17:45, Micah Kornfield a écrit :
>> Hi Hongze,
>> Sorry I started taking a look at these a while ago, but my focus has been
>> elsewhere with the time I have available to contribute to the project.  One
>> thing that can also help is if there is a way to divide any of the PRs into
>> smaller standalone components it would likely help get them merged sooner
>> (I seem to recall at least one PR redid both how memory management was
>> working between C++ and Java as well as adding more functionality for
>> datasets, apologies if I am misremembering).
>>  If other people have time to review that would be great.
>> Thanks,
>> Micah
>>> On Wed, Aug 4, 2021 at 6:11 AM Wes McKinney <wesmck...@gmail.com> wrote:
>>> hi Hongze — I am not sure who will be able to review these, but in the
>>> future feel free to raise your Java PRs on the mailing list even
>>> sooner, no need to wait for more than a month. There are far fewer
>>> active Java developers vs. C++ or Rust, so it can help to get people's
>>> attention on your work.
>>> 
>>> - Wes
>>> 
>>> On Tue, Aug 3, 2021 at 9:44 PM Hongze Zhang <notify...@126.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I have some PRs that were to improve Dataset API's Java implementation
>>>> have not been reviewing for months. Could someone help me to review
>>>> them? Thanks in advance.
>>>> 
>>>> 1. https://github.com/apache/arrow/pull/10201
>>>> ARROW-11776: [Java][Dataset] Support writing to files within dataset
>>>> scanner via JNI
>>>> 2. https://github.com/apache/arrow/pull/10333
>>>> ARROW-12607: [Website] Doc section for Dataset Java bindings
>>>> 3. https://github.com/apache/arrow/pull/10114
>>>> ARROW-12480: [Java][Dataset] FileSystemDataset: Support reading from a
>>>> directory
>>>> 4.https://github.com/apache/arrow/pull/10652
>>>> ARROW-13257: [Java][Dataset] Allow passing empty columns for projection
>>>> 
>>>> One of the most critical changes among the PRs is to add write support
>>>> to Java API (The first in the list). This also includes some work that
>>>> builds a common way to share Arrow data between C++ and Java over JNI.
>>>> Also this work was pretty close to the proposal in ARROW-7272[1].
>>>> 
>>>> Other PRs are minor improvements like the the second one to create Java
>>>> Dataset doc page on Arrow website. It also received some review
>>>> comments already.
>>>> 
>>>> Thanks,
>>>> Hongze
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/ARROW-7272
>>>> 
>>> 

Reply via email to