tanmayrauth opened a new pull request, #891:
URL: https://github.com/apache/iceberg-go/pull/891
After stats-based row group filtering, apply an additional bloom filter
check for EqualTo and In predicates. Row groups where none of the queried
values appear in the column bloom filter are skipped, reducing I/O for
selective point lookups.
- Add RowGroupBloomPred and ParquetRowGroupTester types to parquet_files.go;
GetRecords runs stats then bloom filter checks per row group using the
physical-byte hasher from the bloom filter itself to guarantee algorithm
consistency with the writer
- Add literalToPhysBytes and bloomPredicateCollector in evaluators.go to
extract bloom-filterable predicates from bound expressions; And merges
predicates from both sides, Or suppresses collection
- Wire ParquetRowGroupTester in arrow_scanner.go processRecords
- Add TestBloomFilterRowGroupPruning covering present/absent/In/unknown
field ID cases; add TestLiteralToPhysBytes and TestBloomPredicateCollector
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]