I proposed an API here: https://github.com/apache/arrow/pull/8052
It is not much different from what Wes proposed earlier in the thread, except in error reporting. Comments welcome (here or on the PR). Regards Antoine. Le 16/08/2020 à 21:28, Wes McKinney a écrit : > I opened https://issues.apache.org/jira/browse/ARROW-9761 about adding > a preliminary C++ (and Python) implementation to help stir the pot. My > understanding is that DuckDB is working on using the C interface right > now [1] and the absence of an iterator interface makes such > integration require more work than would be ideal > > [1]: https://github.com/cwida/duckdb/issues/151#issuecomment-674120291 > > On Fri, Aug 14, 2020 at 6:57 PM Jacques Nadeau <jacq...@apache.org> wrote: >> >> I think this unlocks a bunch of use cases. I think people are generally >> using Arrow in simpler, non-streaming ways right now and thus the quiet. >> Producing an iterator pattern is logical as you move to streams of smaller >> chunks (common in distributed and multi-tenant systems). >> >> On Mon, Aug 10, 2020 at 11:56 AM Wes McKinney <wesmck...@gmail.com> wrote: >> >>> I'm still in need of it. I'd be interested in developing a solution >>> that can be used in some database APIs, e.g. using it for the result >>> interface for an embedded SQL database like SQLite or DuckDB would be >>> an interesting motivating use case. >>> >>> One approach would be to create something unofficial and used only in >>> the C++ library's implementation of the C API so that it can make >>> breaking changes for a time and then propose to formalize it in the >>> ABI later. >>> >>> On Mon, Aug 10, 2020 at 9:22 AM Antoine Pitrou <solip...@pitrou.net> >>> wrote: >>>> >>>> >>>> From the absence of response, it would seem there isn't much interest >>>> in this. Please speak up if you think this would be useful to you. >>>> >>>> Regards >>>> >>>> Antoine. >>>> >>>> >>>> On Tue, 7 Jul 2020 07:49:17 -0500 >>>> Wes McKinney <wesmck...@gmail.com> wrote: >>>>> Any opinions about this? It seems the next steps would be a concrete >>>>> API proposal and perhaps a reference implementation thereof. >>>>> >>>>> On Sun, Jun 28, 2020 at 11:26 PM Wes McKinney <wesmck...@gmail.com> >>> wrote: >>>>>> >>>>>> In ARROW-8301 [1] and elsewhere we've been discussing how to >>>>>> communicate what amounts to a sequence of arrays or a sequence of >>>>>> RecordBatch objects using the C data interface. >>>>>> >>>>>> Example use cases: >>>>>> >>>>>> * Returning a sequence of record / row batches from a database driver >>>>>> * Sending a C++ arrow::ChunkedArray or arrow::Table to a consumer >>>>>> using only the C interface >>>>>> >>>>>> Applications could define their own custom iterator interfaces to >>>>>> communicate what amounts to a sequence of the ArrowArray C interface >>>>>> objects, but it is likely a common enough use case to have an >>>>>> off-the-shelf solution so that we can support this solution in our >>>>>> reference libraries (e.g. Arrow C++, pyarrow, Arrow R) >>>>>> >>>>>> I suggested a C structure as follows >>>>>> >>>>>> struct ArrowArrayStream { >>>>>> void (*get_schema)(struct ArrowSchema*); >>>>>> // Non-zero return value indicates an error? >>>>>> int (*get_next)(struct ArrowArray*); >>>>>> void (*get_error)(... ERROR HANDLING TODO ); >>>>>> void (*release)(struct ArrowArrayStream*); >>>>>> void* private_data; >>>>>> }; >>>>>> >>>>>> The producer would populate this object with pointers to its >>>>>> implementations of these functions. >>>>>> >>>>>> Thoughts about this? >>>>>> >>>>>> Thanks, >>>>>> Wes >>>>>> >>>>>> [1]: https://issues.apache.org/jira/browse/ARROW-8301 >>>>> >>>> >>>> >>>> >>>