jonded94 commented on issue #44599: URL: https://github.com/apache/arrow/issues/44599#issuecomment-2479396735
We have a script that launches quite a lot of Python processes that use something very similar to the test script that I've shown. Unfortunately, even with 250GiB RAM available on the specific host, we see OOM errors after a handful iterations, so it appears that the memory is not entirely free to use or at least the kernel is getting uneasy. We then replaced the very few used `pyarrow` methods with a custom in-house written, very shallow PyO3 wrapper around the [parquet Rust crate](https://docs.rs/parquet/latest/parquet/) which offers similar functionality. With that, we see constant memory load, regardless of long the script was run, and we're using only a few dozen GiBs total; that's far less than the `pyarrow` implementation. I know that our use case is pretty specific, but I wanted to share our experience regardless. If my experience is too vague to be of any actual debugging value to you, we can close the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
