Thank you both for the prompt response.

Just to check I understand, Antoine, your recommendation is:

1. Rust implementation should expose the ABI
2. Rust implementation should be able to consume (and use) the ABI (without
owning it, but still call the `release`)

And, likewise,

1. C/Pyarrow should expose this ABI
2. C/Pyarrow should be able to create an array from this ABI

If yes, I agree with you that this is the way to go, especially due to how
it handles alloc/dealloc (passing a pointer to the free). I would be
willing to help with this, if others agree with it. However, I would need
someone to mentor this, as I am outside my comfort zone wrt FFI and C ABIs.

>From Rust's end, I think that we need to declare a new struct that is
#[Repr(C)], and write some functions that convert it from/to `ArrayData`,
which is the struct that stores this data. Furthermore, we need to cater
for what Jörn pointed out about the `typed_data`, that requires alignment.

I am less certain about the following:

1. in pyarrow, I was only able to find Array.from_buffers and from_pandas.
Is the ABI implemented but not documented?
2. in pyarrow, I was unable to find Array.to_abi() or equivalent. Is the
ABI implemented but not documented?
3. do we have a place in the project where we test these things (maybe
integration?). IMO we need to compile both projects and have both
communicate in the same process. I have been doing this via Python (pypi
pyarrow and pyo3 for rust), but for this both need to be compiled from
master.
4. Is there a "source of truth" that we can use to generate and consume
these in-memory structs, e.g. to perform round-trips.
5. if an implementation for C/pyarrow is required, is there anyone willing
to pair up to help on that side? I am not familiar with that code-base.

Finally and most importantly, are there concerns/objections to this?

Personally, I think that it would be awesome to have C and Rust be able to
share Arrow arrays back and forth through pointers.

Best,
Jorge


On Tue, Sep 22, 2020 at 7:20 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 22/09/2020 à 19:16, Jorge Cardoso Leitão a écrit :
> > Hi,
> >
> > I had some time to look at
> https://issues.apache.org/jira/browse/ARROW-10039,
> > wrt to the alignment requirements that rust implementation currently
> > imposes.
> >
> > The gist is that it is not that easy, and I would like to request some
> > guidance.
> >
> > Some facts:
> > 1. Our current implementation does not accept a pointer if that pointer
> is
> > not memory aligned (we panic)
> > 2. Our rust implementation's alignment is a static/const that depends on
> > the architecture and therefore constant throughout the program
> > 3. Rust alloc/dealloc both require an argument denoting memory alignment.
> > 4. calling dealloc with the wrong alignment is undefined behavior
> >
> > 3-4 imply that removing our safeguard against unaligned memory (wrt to
> the
> > constant alignment) leads to undefined behavior: we take ownership of a
> > pointer with an alignment X != our alignment and when we try to free it,
> we
> > enter undefined world.
>
> If you are given a foreign pointer (e.g. coming from Python or C++), you
> should simply never deallocate it yourself.  You don't know which
> allocator gave you the pointer, and it's probably not the Rust allocator
> (so it can't manage the pointer anyway).
>
> What you should do is call the destructor, if any, that comes with the
> buffer pointer.
>
> I'll note again that the C data interface addresses those issues ;-)
>
> Regards
>
> Antoine.
>

Reply via email to