Hi Nicholas,
I don't think allowing for flexibility of non 8 byte aligned types is a
good idea.  The specification explicitly calls out the alignment
requirements and allowing for writers to output different non-aligned
values potentially breaks other implementations.

I'm not sure of your exact use-case but another approach to consider is to
store the values in a single Arrow column as either a list or a fixed size
list and look into doing zero copy from that to the corresponding pandas
memory (this is hypothetical, again I don't have enough context on
pandas/numpy memory layouts).

-Micah

On Thu, Nov 12, 2020 at 3:01 PM Nicholas White <n.j.wh...@gmail.com> wrote:

> OK got everything to work, https://github.com/apache/arrow/pull/8644
> (part of ARROW-10573 now) is ready for review. I've updated the test case
> to show it is possible to zero-copy a pandas DataFrame! The next step is to
> dig into `arrow_to_pandas.cc` to make it work automagically...
>
> On Wed, 11 Nov 2020 at 22:52, Nicholas White <n.j.wh...@gmail.com> wrote:
>
>> Thanks all, this has been interesting. I've made a patch that sort-of
>> does what I want[1] - I hope the test case is clear! I made the batch
>> writer use the `alignment` field that was already in the `IpcWriteOptions`
>> to align the buffers, instead of fixing their alignment at 8. Arrow then
>> writes out the buffers consecutively, so you can map them as a 2D memory
>> array like I wanted. There's one problem though...the test case thinks the
>> arrow data is invalid as it can't read the metadata properly (error below).
>> Do you have any idea why? I think it's because Arrow puts the metadata at
>> the end of the file after the now-unaligned buffers yet assumes the
>> metadata is still 8-byte aligned (which it probably no longer is).
>>
>> Nick
>>
>> ````
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> pyarrow/ipc.pxi:494: in pyarrow.lib.RecordBatchReader.read_all
>>     check_status(self.reader.get().ReadAll(&table))
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>
>> >   raise ArrowInvalid(message)
>> E   pyarrow.lib.ArrowInvalid: Expected to read 117703432 metadata bytes,
>> but only read 19
>> ````
>>
>> [1] https://github.com/apache/arrow/pull/8644
>>
>>

Reply via email to