I know some on this list are familiar, but many may not have seen ndtypes
in xnd:  https://github.com/xnd-project/ndtypes

It generalizes PEP 3118 for cross-language data-structure handling.

Either a dependency on the small C-library libndtypes or using the concepts
could be done.

-Travis


On Wed, Sep 18, 2019 at 10:52 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hello,
>
> One thing that was discussed in the sync call is the ability to easily
> pass arrays at runtime between Arrow implementations or Arrow-supporting
> libraries in the same process, without bearing the cost of linking to
> e.g. the C++ Arrow library.
>
> (for example: "Duckdb wants to provide an option to return Arrow data of
> result sets, but they don't like having Arrow as a dependency")
>
> One possibility would be to define a C-level protocol similar in spirit
> to the Python buffer protocol, which some people may be familiar with (*).
>
> The basic idea is to define a simple C struct, which is ABI-stable and
> describes an Arrow away adequately.  The struct can be stack-allocated.
> Its definition can also be copied in another project (or interfaced with
> using a C FFI layer, depending on the language).
>
> There is no formal proposal, this message is meant to stir the discussion.
>
> Issues to work out:
>
> * Memory lifetime issues: where Python simply associates the Py_buffer
> with a PyObject owner (a garbage-collected Python object), we need
> another means to control lifetime of pointed areas.  One simple
> possibility is to include a destructor function pointer in the protocol
> struct.
>
> * Arrow type representation.  We probably need some kind of "format"
> mini-language to represent Arrow types, so that a type can be described
> using a `const char*`.  Ideally, primitives types at least should be
> trivially parsable.  We may take inspiration from Python here (`struct`
> module format characters, PEP 3118 format additions).
>
> Example C struct definition (not a formal proposal!):
>
> struct ArrowBuffer {
>   void* data;
>   int64_t nbytes;
>   // Called by the consumer when it doesn't need the buffer anymore
>   void (*release)(struct ArrowBuffer*);
>   // Opaque user data (for e.g. the release callback)
>   void* user_data;
> };
>
> struct ArrowArray {
>   // Type description
>   const char* format;
>   // Data description
>   int64_t length;
>   int64_t null_count;
>   int64_t n_buffers;
>   // Note: this pointers are probably owned by the ArrowArray struct
>   // and will be released and free()ed by the release callback.
>   struct BufferDescriptor* buffers;
>   struct ArrowDescriptor* dictionary;
>   // Called by the consumer when it doesn't need the array anymore
>   void (*release)(struct ArrowArrayDescriptor*);
>   // Opaque user data (for e.g. the release callback)
>   void* user_data;
> };
>
> Thoughts?
>
> (*) For the record, the reference for the Python buffer protocol:
> https://docs.python.org/3/c-api/buffer.html#buffer-structure
> and its C struct definition:
> https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195
>
> Regards
>
> Antoine.
>


-- 

*Travis Oliphant*
CEO
512 826 7480

<https://www.quansight.com/>

Reply via email to