rdblue opened a new pull request #1318: URL: https://github.com/apache/iceberg/pull/1318
This adds a list of equality field IDs to `DeleteFile` metadata that is tracked in manifests. The equality field ID list identifies the subset of a delete file's fields (by ID) that should be used when comparing rows against the rows deleted in an equality delete file. The remaining fields are informational and store the values that were deleted by an equality delete. These may be used, for example, to reconstruct CDC events that contained deleted row values. Additional fields will also be used for min/max stats filtering and are required to be accurate. When finding files for a table scan, the min/max values for non-comparison columns can still be used to filter delete files, even if the delete file applies to a data file that will be read. For example, if a table contains columns `id` and `data`, a delete file may delete the row `(5, "bees!")` using only the `id` field for equality. A scan filter like `data like 'a%'` can cause the delete file to be ignored, even for a data file that contains ID `5` because the scan filter will remove the row. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
