bodduv commented on PR #14500:
URL: https://github.com/apache/iceberg/pull/14500#issuecomment-3760098770

   @pvary Thank you for promptly jumping in (during previous community sync) 
while I could not recollect the details of why we wanted to pause on the 
behavior change. 
   
   It was discussed that evaluating expressions once with RFC-compliant 
comparator and again with signed comparator to then allow the filter to pass if 
either of the expression evaluations is true. I attempt this in 
[51f873e](https://github.com/apache/iceberg/pull/14500/commits/51f873e75a77a85176a96c2cfceb9acf15b83834).
 Following is a short description of the changes.
   
   - __Write Path__: while writing new Parquet files, metrics are prepared 
using RFC-compliant UUID comparison (Comparators.uuids()), producing correct 
min/max statistics.
   - __Read Path__: uses two expression evaluations: once with RFC-compliant 
comparator (to work with newly written files) and another with signed 
comparator for backward compatibility with older files that may have UUID 
statistics computed with Java's signed UUID.compareTo().
   Affected evaluators: InclusiveMetricsEvaluator, ManifestEvaluator, 
ParquetMetricsRowGroupFilter
   - __Performance__: (almost) zero overhead for non-UUID queries. May require 
evaluating (filter) expressions twice, so there is some non-zero performance 
impact. But I suppose we would prioritize correctness.
   
   Implementation around expression evaluation was not conducive to such two 
fold evaluations. So I had to force it. I hope the increased code complexity is 
acceptable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to