Re: [DISCUSS] Improving Arrow columnar implementation guidelines for third parties

Antoine Pitrou Thu, 19 Sep 2019 09:13:42 -0700


Le 19/09/2019 à 17:33, Wes McKinney a écrit :
> On Thu, Sep 19, 2019 at 2:01 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
>>
>> Wes,
>> Let me see if I understand, I think there are two issues:
>> 1.  Ensuring conformance of interoperability and actually having people
>> understand what Arrow actually is and what it is not.
>> 2.  Having users adopt reference implementations and surrounding libraries.
>>
>> For 1, I agree we should have a way of measuring things here.  I think
>> being able to document the requirements of our test-suite and have it
>> generate a report on features supported would go a long way to letting
>> users understand the quality of both internal/external implementations.  It
>> seems like there is still a lot of misunderstanding of what Arrow is and
>> how it relates to other technologies.  An example of this is a recent Julia
>> thread [1], which seems to have both some misinformed commentary and
>> potentially some points that we could improve upon as a community.
>> Hopefully, some of this will be helped by separately versioning the
>> specification and the libraries post 1.0.0.
> 
> Thanks for the pointer to the thread. I've been trying for a couple of
> years to engage with the Julia community.
> 
> The bottom line is that I think it's important to highlight that
> compatibility or interoperability will not be achieved by hand-waving.
> There's a couple of things we can do


What we discussed in the sync call is that by providing a C-level data
protocol (see discussion thread), we can allow any runtime with a C FFI
facility to easily experiment and interface with Arrow data (as a
producer and/or as a consumer).

This would have a reasonable implementation cost for us and hopefully
also for users of this data protocol.  Also, it is effectively a
zero-dependency solution, since the C struct definition can be pasted in
the target project's source code (or translated in the preferred local
form, e.g. ctypes definitions in Python).

C FFIs do not always have the best performance (depending on the
impedance mismatch between static C data and the runtime's own data
model), but that would still be a good starting point, and in many cases
it might be good enough.

Regards

Antoine.

Re: [DISCUSS] Improving Arrow columnar implementation guidelines for third parties

Reply via email to