tanmayrauth opened a new pull request, #891:
URL: https://github.com/apache/iceberg-go/pull/891

   After stats-based row group filtering, apply an additional bloom filter 
check for EqualTo and In predicates. Row groups where none of the queried 
values appear in the column bloom filter are skipped, reducing I/O for 
selective point lookups.
   
   - Add RowGroupBloomPred and ParquetRowGroupTester types to parquet_files.go; 
GetRecords runs stats then bloom filter checks per row group using the 
physical-byte hasher from the bloom filter itself to guarantee algorithm 
consistency with the writer
   - Add literalToPhysBytes and bloomPredicateCollector in evaluators.go to 
extract bloom-filterable predicates from bound expressions; And merges 
predicates from both sides, Or suppresses collection
   - Wire ParquetRowGroupTester in arrow_scanner.go processRecords
   - Add TestBloomFilterRowGroupPruning covering present/absent/In/unknown 
field ID cases; add TestLiteralToPhysBytes and TestBloomPredicateCollector
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to