Dandandan commented on issue #17171: URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3280230056
For running DataFusion in a single node, what would be the benefit of a bloom filter vs just using the hash table as is? I don't expect wa large performance improvement of using a bloom filter in this scenario where the build side is small? Lookup operations in a table shouldn't be slow (especially a simple "contains"). I think one use case might actually be not for small tables, but for larger build sides (which don't fit in CPU cache anymore), where the bloom filter can compress the size of the table so filtering actually *might* be faster than a hash table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org