Re: Pandas Block Manager

Micah Kornfield Tue, 10 Nov 2020 15:53:54 -0800

Sorry, I should clarify, I'm not familiar with zero copy from Pandas to
Arrow, so there might be something else going on here.  But once an arrow
file is written out, buffers will be padded/aligned to 8 bytes.


In general, I think relying on exact memory translation from systems that
aren't used arrow, might require copies.

On Tue, Nov 10, 2020 at 3:49 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> My question is: why are these addresses not 40 bytes apart from each other?
>> What's in the gaps between the buffers? It's not null bitsets - there's
>> only one buffer for each column. Thanks -
>
>
> All buffers are padded to at least 8 bytes (and per the spec 64 is
> recommended).
>
> On Tue, Nov 10, 2020 at 3:39 PM Nicholas White <n.j.wh...@gmail.com>
> wrote:
>
>> I've done a bit more digging. This code:
>> ````
>> df = pd.DataFrame(np.random.randint(10, size=(5, 5)))
>> table = pa.Table.from_pandas(df)
>> mem = []
>> for c in table.columns:
>>     buf = c.chunks[0].buffers()[1]
>>     mem.append((buf.address, buf.size))
>> sorted(mem)
>> ````
>> ...prints...
>> ````
>>
>> [(140262915478912, 40),
>>  (140262915479232, 40),
>>  (140262915479296, 40),
>>  (140262915479360, 40),
>>  (140262915479424, 40)]
>>
>> ````
>> My question is: why are these addresses not 40 bytes apart from each
>> other?
>> What's in the gaps between the buffers? It's not null bitsets - there's
>> only one buffer for each column. Thanks -
>>
>> Nick
>>
>

Re: Pandas Block Manager

Reply via email to