arunb2w opened a new issue, #6928: URL: https://github.com/apache/iceberg/issues/6928
### Apache Iceberg version 1.1.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I am running a merge-into query and wanted to see how it behaves with write.merge.mode'='copy-on-write' and 'merge-on-read' for the same input on the same cluster config." From the spark UI, the merge sql took only **2.5mins in copy-on-write** and for the same load **merge-on-read took 11mins.** Attaching the images to show stage level behavior in which we can see huge shuffle write with merge-on-read whereas it is very minimal with copy-on-write. When further analysing the SQL tab of the spark UI, were able to find out dynamic-pruning is not happened with MoR whereas we could see that filter in CoW. My understanding is that, for faster writes we should prefer MoR but in this case MoR is actually performing worst than CoW. Questions: Why dynamic pruning is not happening with MoR when running merge into query? Why shuffle write is huge when using MoR for the same input batch? How to optimize the MoR performance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
