rdblue commented on pull request #1981: URL: https://github.com/apache/iceberg/pull/1981#issuecomment-751026580
> To make sure I understand correctly, changes in projection methods ensure that both behaviors before and after this fix will be accounted for by the projection, so that we might not need to have separate implementations for format v1 versus v2, with a slight penalty that in v2 we may scan more data than we have to? Yes, this fixes the transforms and ensures that predicate projection includes the partitions that were written with bad values. That means that we won't need different implementations for v2, but it also means that we can avoid scanning the extra partitions in the future. Because this is fixed before v2, we can ensure that all v2 tables have been fixed. So if a table is created as v2, we should be able to know that no older writers with the bug have written to the table. The only case where a v2 table would have bad metadata is when a v1 table has been converted. We should add a flag to signal that the table was converted from v1 or one that signals it was created as v2 that allows us to skip the extra partitions. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
