I know some on this list are familiar, but many may not have seen ndtypes in xnd: https://github.com/xnd-project/ndtypes
It generalizes PEP 3118 for cross-language data-structure handling. Either a dependency on the small C-library libndtypes or using the concepts could be done. -Travis On Wed, Sep 18, 2019 at 10:52 AM Antoine Pitrou <anto...@python.org> wrote: > > Hello, > > One thing that was discussed in the sync call is the ability to easily > pass arrays at runtime between Arrow implementations or Arrow-supporting > libraries in the same process, without bearing the cost of linking to > e.g. the C++ Arrow library. > > (for example: "Duckdb wants to provide an option to return Arrow data of > result sets, but they don't like having Arrow as a dependency") > > One possibility would be to define a C-level protocol similar in spirit > to the Python buffer protocol, which some people may be familiar with (*). > > The basic idea is to define a simple C struct, which is ABI-stable and > describes an Arrow away adequately. The struct can be stack-allocated. > Its definition can also be copied in another project (or interfaced with > using a C FFI layer, depending on the language). > > There is no formal proposal, this message is meant to stir the discussion. > > Issues to work out: > > * Memory lifetime issues: where Python simply associates the Py_buffer > with a PyObject owner (a garbage-collected Python object), we need > another means to control lifetime of pointed areas. One simple > possibility is to include a destructor function pointer in the protocol > struct. > > * Arrow type representation. We probably need some kind of "format" > mini-language to represent Arrow types, so that a type can be described > using a `const char*`. Ideally, primitives types at least should be > trivially parsable. We may take inspiration from Python here (`struct` > module format characters, PEP 3118 format additions). > > Example C struct definition (not a formal proposal!): > > struct ArrowBuffer { > void* data; > int64_t nbytes; > // Called by the consumer when it doesn't need the buffer anymore > void (*release)(struct ArrowBuffer*); > // Opaque user data (for e.g. the release callback) > void* user_data; > }; > > struct ArrowArray { > // Type description > const char* format; > // Data description > int64_t length; > int64_t null_count; > int64_t n_buffers; > // Note: this pointers are probably owned by the ArrowArray struct > // and will be released and free()ed by the release callback. > struct BufferDescriptor* buffers; > struct ArrowDescriptor* dictionary; > // Called by the consumer when it doesn't need the array anymore > void (*release)(struct ArrowArrayDescriptor*); > // Opaque user data (for e.g. the release callback) > void* user_data; > }; > > Thoughts? > > (*) For the record, the reference for the Python buffer protocol: > https://docs.python.org/3/c-api/buffer.html#buffer-structure > and its C struct definition: > https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195 > > Regards > > Antoine. > -- *Travis Oliphant* CEO 512 826 7480 <https://www.quansight.com/>