I would love to see a component or some standardization around Java using the C Data Interface. I’ve been prototyping JNI bindings for DataFusion in the last week or so with some success, and was getting ready to ask where/how such a thing might fit in. I’ll be sure to watch that JIRA.
(I also prototyped with a Panama EA build, but obviously we’re a long way from using that) Paul > On Aug 4, 2021, at 11:11 AM, Antoine Pitrou <anto...@python.org> wrote: > > > I don't know about the rest of these tasks, but sharing data between Arrow > Java and C++ should definitely use the C data interface. > > It seems there's work in progress here, feel free to collaborate: > https://issues.apache.org/jira/browse/ARROW-12965 > > Regards > > Antoine. > > >> Le 04/08/2021 à 17:45, Micah Kornfield a écrit : >> Hi Hongze, >> Sorry I started taking a look at these a while ago, but my focus has been >> elsewhere with the time I have available to contribute to the project. One >> thing that can also help is if there is a way to divide any of the PRs into >> smaller standalone components it would likely help get them merged sooner >> (I seem to recall at least one PR redid both how memory management was >> working between C++ and Java as well as adding more functionality for >> datasets, apologies if I am misremembering). >> If other people have time to review that would be great. >> Thanks, >> Micah >>> On Wed, Aug 4, 2021 at 6:11 AM Wes McKinney <wesmck...@gmail.com> wrote: >>> hi Hongze — I am not sure who will be able to review these, but in the >>> future feel free to raise your Java PRs on the mailing list even >>> sooner, no need to wait for more than a month. There are far fewer >>> active Java developers vs. C++ or Rust, so it can help to get people's >>> attention on your work. >>> >>> - Wes >>> >>> On Tue, Aug 3, 2021 at 9:44 PM Hongze Zhang <notify...@126.com> wrote: >>>> >>>> Hi, >>>> >>>> I have some PRs that were to improve Dataset API's Java implementation >>>> have not been reviewing for months. Could someone help me to review >>>> them? Thanks in advance. >>>> >>>> 1. https://github.com/apache/arrow/pull/10201 >>>> ARROW-11776: [Java][Dataset] Support writing to files within dataset >>>> scanner via JNI >>>> 2. https://github.com/apache/arrow/pull/10333 >>>> ARROW-12607: [Website] Doc section for Dataset Java bindings >>>> 3. https://github.com/apache/arrow/pull/10114 >>>> ARROW-12480: [Java][Dataset] FileSystemDataset: Support reading from a >>>> directory >>>> 4.https://github.com/apache/arrow/pull/10652 >>>> ARROW-13257: [Java][Dataset] Allow passing empty columns for projection >>>> >>>> One of the most critical changes among the PRs is to add write support >>>> to Java API (The first in the list). This also includes some work that >>>> builds a common way to share Arrow data between C++ and Java over JNI. >>>> Also this work was pretty close to the proposal in ARROW-7272[1]. >>>> >>>> Other PRs are minor improvements like the the second one to create Java >>>> Dataset doc page on Arrow website. It also received some review >>>> comments already. >>>> >>>> Thanks, >>>> Hongze >>>> >>>> [1] https://issues.apache.org/jira/browse/ARROW-7272 >>>> >>>