Re: [D] Feedback on high memory usage when merging N parquet files [datafusion]

via GitHub Mon, 24 Nov 2025 15:44:56 -0800


GitHub user ndchandar added a comment to the discussion: Feedback on high 
memory usage when merging N parquet files


I was able to reduce parallelism by tuning 
`datafusion.execution.target_partitions` for our workloads. This resulted in 
lesser memory and cpu usage. I also bumpled 
`datafusion.execution.parquet.write_batch_size` to a much higher number (from 
the default `8192` to `65536`. Are there other parameters that I could tune?  I 
am trying to find the balance between optimal memory/cpu usage versus being 
reasonably quick with regards to compaction/merging

GitHub link: 
https://github.com/apache/datafusion/discussions/18833#discussioncomment-15067528

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] Feedback on high memory usage when merging N parquet files [datafusion]

Reply via email to