vkhodygo commented on PR #43389:
URL: https://github.com/apache/arrow/pull/43389#issuecomment-2296463608

   Fab, a lot of people will be happy with this one. 
   
   A few questions here: 
   
   - there was a lengthy discussion (and a document) about larger than memory 
datasets https://github.com/apache/arrow/pull/13669 
https://github.com/apache/arrow/issues/31769, will there be any progress in 
this direction? Now that the limit is gone I expect an influx of reports about 
crashed code solely because of the dataset size. 
   -  > An extra 4 bytes of memory consumption for each row due to the offset 
size difference from 32-bit to 64-bit.
       A wider offset type requires a few more SIMD instructions in each 8-row 
processing iteration.
   
      Do you think it's possible to add some heuristics and/or an explicit key 
to keep the old behaviour for reasonably small datasets?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to