Hi Jacques,
Le 03/10/2019 à 02:46, Jacques Nadeau a écrit : > > I think it is reasonable to argue that keeping any ABI (or header/struct > pattern) as narrow as possible would allow us to minimize overlap with the > existing in-memory specification. In Arrow's case, this could be as simple > as a single memory pointer for schema (backed by flatbuffers) and a single > memory location for data (that references the record batch header, which in > turn provides pointers into the actual arrow data). [...] > > [...] (For example, in a JVM > view of the world, working with a plain struct in java rather than a set of > memory pointers against our existing IPC formats would be quite painful and > we'd definitely need to create some glue code for users. I worry the same > pattern would occur in many other languages.) I'm trying to understand the point you're making. Here you say that it was difficult for the JVM to deal with raw pointers. But above you seem to argue for a flatbuffers-based serialization containing raw pointers. Here's another way to frame the question: how do you propose to do zero-copy between different languages if not by passing raw pointers to the Arrow data? And if passing raw pointers is acceptable, what is wrong with the spec as proposed? As for creating glue code: yes, of course, that would be needed in most languages that want to provide this interface (including C++). You do need a C FFI for that. I'm quite sure it would be possible to implement this proposal in pure Python with ctypes / cffi, for example (as a toy example, since PyArrow exists :-)). When writing the spec, I also took a look at the Go and Rust FFIs, and they seem good enough to interact with it. I tried to take a look at JNI, but of course I got lost in the documentation :-) If you are worried that people start thinking that this proposal is part of the Arrow specification, perhaps we can make it clear that exposing this interface is optional for implementations. Regards Antoine.