vkhodygo commented on PR #43389: URL: https://github.com/apache/arrow/pull/43389#issuecomment-2296463608
Fab, a lot of people will be happy with this one. A few questions here: - there was a lengthy discussion (and a document) about larger than memory datasets https://github.com/apache/arrow/pull/13669 https://github.com/apache/arrow/issues/31769, will there be any progress in this direction? Now that the limit is gone I expect an influx of reports about crashed code solely because of the dataset size. - > An extra 4 bytes of memory consumption for each row due to the offset size difference from 32-bit to 64-bit. A wider offset type requires a few more SIMD instructions in each 8-row processing iteration. Do you think it's possible to add some heuristics and/or an explicit key to keep the old behaviour for reasonably small datasets? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org