irshadcc opened a new issue, #9118: URL: https://github.com/apache/iceberg/issues/9118
### Feature Request / Improvement During the snapshot commit process of MergingSnapshotProducer, Iceberg tries to merge the manifest files to increase the planning performance. During operations like overwrite, MergingSnapshotProducer finds the matching manifest files by deleteExpression and rewrites the manifests by filtering out the deleted manifest entries. While finding the matching manifests by deleteExpression, we found that the Schema is created every time the expression needs to be bound. This has proven very expensive when there are large number of manifest files (~25,000 manifest files) for a super wide table (~35,000 columns). For optimising the manifest evaluation, we can add a method called asSchema() in the StructType class to avoid creating the new Schema every time the filter needs to be After caching the Schema for the StructType, we can avoid creating new schema while we try to bind the expression. The optimisation reduced the manifest evaluation time from 06 minutes 22 seconds to 13 seconds. ### Query engine None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org