RussellSpitzer commented on PR #14435: URL: https://github.com/apache/iceberg/pull/14435#issuecomment-4027129644
I think large row group size may be the only place where this make sense but only if you are on HDFS and have very-very large tables. I think it's almost always objectively better to have multiple manifest entries than a single one for scanning. I am also not convinced by arguments that we should make Iceberg perform better for tools which list directory contents. We shouldn't optimize for things that directly contradict Iceberg's goals (eliminating the burden of list operations.) I think I would need to see some real benchmarking to be convinced that this is the right pathway to go, especially for small files. For large files we would need a better argument about why we would want to compact files which are already large to make them extra large. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
