GitHub user ndchandar added a comment to the discussion: Feedback on high memory usage when merging N parquet files
I was able to reduce parallelism by tuning `datafusion.execution.target_partitions` for our workloads. This resulted in lesser memory and cpu usage. I also bumpled `datafusion.execution.parquet.write_batch_size` to a much higher number (from the default `8192` to `65536`. Are there other parameters that I could tune? I am trying to find the balance between optimal memory/cpu usage versus being reasonably quick with regards to compaction/merging GitHub link: https://github.com/apache/datafusion/discussions/18833#discussioncomment-15067528 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
