alamb commented on issue #3138: URL: https://github.com/apache/arrow-rs/issues/3138#issuecomment-1323808015
I like the idea of specifying fpp (and it follows the arrow C++model) > with which we'd assume all unique items I think that makes sense as the main use case for bloom filters is high cardinality / close to unique columns. Perhaps we can document the case clearly (aka "bloom filters will likely only help for almost unique data like "ids" and "uuids", for other types sorting /clustering and min/max statistics will work as well if not better) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
