njwhite commented on pull request #8644: URL: https://github.com/apache/arrow/pull/8644#issuecomment-727063708
@wesm I disagree with your assertion that it's only useful in an extraordinarily narrow use case - I've added a test case `test_contiguous_buffers_mixed_types` to show a zero-copy load of mixed-datatype DataFrame. The columns do have to be sorted by dtype for now (`df[df.dtypes.sort_values().index]`) if you want to eliminate consolidation, but `pyarrow` could easily do this under the hood. w.r.t. the spec, I see in _Implementations are **recommended** to allocate memory on aligned addresses_ [here](https://arrow.apache.org/docs/format/Columnar.html); my PR doesn't change the default behaviour of following the recommendation and 8-byte aligning the buffers. A user would have to opt in to creating unaligned files if they wanted the benefit of reading their DataFrames without copying the data. My use case is to expose Arrow files saved on disk (and probably still resident in the OS' page cache) as Pandas Dataframes with as low a latency as possible. Copying the entire DataFrame from one place in memory to another (to strip out the padding) adds latency and memory pressure, far outweighing the performance benefit of 8-byte aligned reads. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org