RussellSpitzer commented on issue #15427: URL: https://github.com/apache/iceberg/issues/15427#issuecomment-3954607396
> The underlying issue I’m trying to highlight is that Iceberg writes string partition values exactly as they appear in the incoming row. If a value contains trailing whitespace, that exact value is stored in the manifest. Because all engines use strict equality for partition pruning, a user filter like `batch_date = '20240201'` will not match a stored value like `'20240201 '` even though the data logically belongs to that partition. It should not logically belong to that partition, it is not equal. That's what i'm getting at here. We may assume it should, or it might be likely that it does and was a user error, but that is an assumption based on our understanding of humans. If the query was "startsWith" it could match but this is an equality check so "20240201" does not belong in partition "20240201 ". I still don't see an example of two engines treating this differently. Do you have an example of an engine which automatically trims all string values? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
