Hi Jacques,

Le 03/10/2019 à 02:46, Jacques Nadeau a écrit :
> 
> I think it is reasonable to argue that keeping any ABI (or header/struct
> pattern) as narrow as possible would allow us to minimize overlap with the
> existing in-memory specification. In Arrow's case, this could be as simple
> as a single memory pointer for schema (backed by flatbuffers) and a single
> memory location for data (that references the record batch header, which in
> turn provides pointers into the actual arrow data). [...]
> 
> [...] (For example, in a JVM
> view of the world, working with a plain struct in java rather than a set of
> memory pointers against our existing IPC formats would be quite painful and
> we'd definitely need to create some glue code for users. I worry the same
> pattern would occur in many other languages.)

I'm trying to understand the point you're making.  Here you say that it
was difficult for the JVM to deal with raw pointers.  But above you seem
to argue for a flatbuffers-based serialization containing raw pointers.

Here's another way to frame the question: how do you propose to do
zero-copy between different languages if not by passing raw pointers to
the Arrow data?  And if passing raw pointers is acceptable, what is
wrong with the spec as proposed?


As for creating glue code: yes, of course, that would be needed in most
languages that want to provide this interface (including C++).  You do
need a C FFI for that.  I'm quite sure it would be possible to implement
this proposal in pure Python with ctypes / cffi, for example (as a toy
example, since PyArrow exists :-)).  When writing the spec, I also took
a look at the Go and Rust FFIs, and they seem good enough to interact
with it.  I tried to take a look at JNI, but of course I got lost in the
documentation :-)

If you are worried that people start thinking that this proposal is part
of the Arrow specification, perhaps we can make it clear that exposing
this interface is optional for implementations.

Regards

Antoine.

Reply via email to