irshadcc opened a new issue, #9118:
URL: https://github.com/apache/iceberg/issues/9118

   ### Feature Request / Improvement
   
   During the snapshot commit process of MergingSnapshotProducer, Iceberg tries 
to merge the manifest files to increase the planning performance. During 
operations like overwrite, MergingSnapshotProducer finds the matching manifest 
files by deleteExpression and rewrites the manifests by filtering out the 
deleted manifest entries. 
   
   While finding the matching manifests by deleteExpression, we found that the 
Schema is created every time the expression needs to be bound. This has proven 
very expensive when there are large number of manifest files (~25,000 manifest 
files) for a super wide table (~35,000 columns).
   
   For optimising the manifest evaluation, we can add a method called 
asSchema() in the StructType class to avoid creating the new Schema every time 
the filter needs to be 
   After caching the Schema for the StructType, we can avoid creating new 
schema while we try to bind the expression. 
   
   The optimisation reduced the manifest evaluation time from 06 minutes 22 
seconds to 13 seconds.
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to