kbendick commented on issue #4813: URL: https://github.com/apache/iceberg/issues/4813#issuecomment-1133160841
Hi @Zhangg7723! You are right that bloom filter in the data files will be useful. It is however somewhat difficult to get right, as a lot of tuning and potentially knowledge of NDV count would need to be known ahead of time (or waste a potentially significant amount of space in the bloom filter). I can say however, that this issue is being worked on. @huaxingao from Apple is working on this and has reached out to the original PR author. I believe they are going to merge Apple's code with that of @jshmchenxi. The two of them would know more about it than I would, but **TLDR** - This is an area of active work and not something that has been forgotten 🙂 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
