Jimexist commented on issue #3138: URL: https://github.com/apache/arrow-rs/issues/3138#issuecomment-1323795187
@alamb i believe we should start simple, to support only 2 params: 1. whether bloom filter is enabled as a master switch 2. a range of fpp, with which we'd assume all unique items, and use that row count per row group to calculate a bitset size, but cap that to 128MiB; for very large fpp e.g. 1.0 or 0.9999 the minimal is 32. controlling disk size does not quite make sense or is counter intuitive because users then need to both estimate unique number of items per row group as well as know how to derive fpp from that - in most cases, having a maxinum fpp is good enough cc @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
