RussellSpitzer commented on issue #15427:
URL: https://github.com/apache/iceberg/issues/15427#issuecomment-3954607396

   > The underlying issue I’m trying to highlight is that Iceberg writes string 
partition values exactly as they appear in the incoming row. If a value 
contains trailing whitespace, that exact value is stored in the manifest. 
Because all engines use strict equality for partition pruning, a user filter 
like `batch_date = '20240201'` will not match a stored value like `'20240201 '` 
even though the data logically belongs to that partition.
   
   It should not logically belong to that partition, it is not equal. That's 
what i'm getting at here. We may assume it should, or it might be likely that 
it does and was a user error, but that is an assumption based on our 
understanding of humans.  If the query was "startsWith" it could match but this 
is an equality check so "20240201" does not belong in partition "20240201 ".
   
   I still don't see an example of two engines treating this differently. Do 
you have an example of an engine which automatically trims all string values?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to