I proposed an API here:
https://github.com/apache/arrow/pull/8052

It is not much different from what Wes proposed earlier in the thread,
except in error reporting.  Comments welcome (here or on the PR).

Regards

Antoine.



Le 16/08/2020 à 21:28, Wes McKinney a écrit :
> I opened https://issues.apache.org/jira/browse/ARROW-9761 about adding
> a preliminary C++ (and Python) implementation to help stir the pot. My
> understanding is that DuckDB is working on using the C interface right
> now [1] and the absence of an iterator interface makes such
> integration require more work than would be ideal
> 
> [1]: https://github.com/cwida/duckdb/issues/151#issuecomment-674120291
> 
> On Fri, Aug 14, 2020 at 6:57 PM Jacques Nadeau <jacq...@apache.org> wrote:
>>
>> I think this unlocks a bunch of use cases. I think people are generally
>> using Arrow in simpler, non-streaming ways right now and thus the quiet.
>> Producing an iterator pattern is logical as you move to streams of smaller
>> chunks (common in distributed and multi-tenant systems).
>>
>> On Mon, Aug 10, 2020 at 11:56 AM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>>> I'm still in need of it. I'd be interested in developing a solution
>>> that can be used in some database APIs, e.g. using it for the result
>>> interface for an embedded SQL database like SQLite or DuckDB would be
>>> an interesting motivating use case.
>>>
>>> One approach would be to create something unofficial and used only in
>>> the C++ library's implementation of the C API so that it can make
>>> breaking changes for a time and then propose to formalize it in the
>>> ABI later.
>>>
>>> On Mon, Aug 10, 2020 at 9:22 AM Antoine Pitrou <solip...@pitrou.net>
>>> wrote:
>>>>
>>>>
>>>> From the absence of response, it would seem there isn't much interest
>>>> in this.  Please speak up if you think this would be useful to you.
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>>>>
>>>> On Tue, 7 Jul 2020 07:49:17 -0500
>>>> Wes McKinney <wesmck...@gmail.com> wrote:
>>>>> Any opinions about this? It seems the next steps would be a concrete
>>>>> API proposal and perhaps a reference implementation thereof.
>>>>>
>>>>> On Sun, Jun 28, 2020 at 11:26 PM Wes McKinney <wesmck...@gmail.com>
>>> wrote:
>>>>>>
>>>>>> In ARROW-8301 [1] and elsewhere we've been discussing how to
>>>>>> communicate what amounts to a sequence of arrays or a sequence of
>>>>>> RecordBatch objects using the C data interface.
>>>>>>
>>>>>> Example use cases:
>>>>>>
>>>>>> * Returning a sequence of record / row batches from a database driver
>>>>>> * Sending a C++ arrow::ChunkedArray or arrow::Table to a consumer
>>>>>> using only the C interface
>>>>>>
>>>>>> Applications could define their own custom iterator interfaces to
>>>>>> communicate what amounts to a sequence of the ArrowArray C interface
>>>>>> objects, but it is likely a common enough use case to have an
>>>>>> off-the-shelf solution so that we can support this solution in our
>>>>>> reference libraries (e.g. Arrow C++, pyarrow, Arrow R)
>>>>>>
>>>>>> I suggested a C structure as follows
>>>>>>
>>>>>> struct ArrowArrayStream {
>>>>>>   void (*get_schema)(struct ArrowSchema*);
>>>>>>   // Non-zero return value indicates an error?
>>>>>>   int (*get_next)(struct ArrowArray*);
>>>>>>   void (*get_error)(... ERROR HANDLING TODO );
>>>>>>   void (*release)(struct ArrowArrayStream*);
>>>>>>   void* private_data;
>>>>>> };
>>>>>>
>>>>>> The producer would populate this object with pointers to its
>>>>>> implementations of these functions.
>>>>>>
>>>>>> Thoughts about this?
>>>>>>
>>>>>> Thanks,
>>>>>> Wes
>>>>>>
>>>>>> [1]: https://issues.apache.org/jira/browse/ARROW-8301
>>>>>
>>>>
>>>>
>>>>
>>>

Reply via email to