kr-hansen commented on issue #37989:
URL: https://github.com/apache/arrow/issues/37989#issuecomment-2604517486

   Hmmm what version were you using @DatSplit?  When working with very large 
data (Dataframes ~200 GB), I continue to see memory crashes with `pyarrow` that 
I don't get with `fastparquet`.  While when writing out `fastparquet` keeps my 
memory flat during the writing out, `pyarrow` has a huge spike that 3-5.5x the 
memory footprint of the actual data itself.
   
   This is for `pyarrow 19.0.0` for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to