xushiyan commented on issue #10997: URL: https://github.com/apache/hudi/issues/10997#issuecomment-2058238697
> we have clustering to group rows together, but it's still thousands of files affected. 75th percentile of individual file overwrite(task in the Doing partition and writing data stage) takes ~40-60 seconds based on this, i think clustering can be tuned further to rewrite files such that more updates can be targeted to the same file to reduce write amplification. Make sure your number of clustering groups is not limited to default 30, otherwise you miss a lot of files to cluster. COW is expected to have high write amplification with heavy updates, especially if you spread out the updates to a lot of files. Also consider a better partitioning to have updates concentrated on a few partitions if possible. Upgrade to newer version to try configuring the executor type too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org