[GitHub] [arrow] njwhite commented on pull request #8644: ARROW-10573: [C++] Align written buffers to specified value

GitBox Fri, 13 Nov 2020 14:19:23 -0800


njwhite commented on pull request #8644:
URL: https://github.com/apache/arrow/pull/8644#issuecomment-727063708



   @wesm I disagree with your assertion that it's only useful in an 
extraordinarily narrow use case - I've added a test case 
`test_contiguous_buffers_mixed_types` to show a zero-copy load of 
mixed-datatype DataFrame. The columns do have to be sorted by dtype for now 
(`df[df.dtypes.sort_values().index]`) if you want to eliminate consolidation, 
but `pyarrow` could easily do this under the hood. 
   
   w.r.t. the spec, I see in _Implementations are **recommended** to allocate 
memory on aligned addresses_ 
[here](https://arrow.apache.org/docs/format/Columnar.html); my PR doesn't 
change the default behaviour of following the recommendation and 8-byte 
aligning the buffers. A user would have to opt in to creating unaligned files 
if they wanted the benefit of reading their DataFrames without copying the data.
   
   My use case is to expose Arrow files saved on disk (and probably still 
resident in the OS' page cache) as Pandas Dataframes with as low a latency as 
possible. Copying the entire DataFrame from one place in memory to another (to 
strip out the padding) adds latency and memory pressure, far outweighing the 
performance benefit of 8-byte aligned reads.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] njwhite commented on pull request #8644: ARROW-10573: [C++] Align written buffers to specified value

Reply via email to