Guosmilesmile commented on PR #16171: URL: https://github.com/apache/iceberg/pull/16171#issuecomment-4363885397
In the current implementation of this feature, there are places where DataFiles/DeleteFiles need to be read one by one. It also requires building a table-level Set and constructing partition-level minDataSeqByPartition. In large table scenarios, this will have performance issues and OOM. So I hope the Flink implementation can leverage distributed computing capabilities just like Spark does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
