Dandandan commented on issue #17171:
URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3280230056

   For running DataFusion in a single node, what would be the benefit of a 
bloom filter vs just using the hash table as is? I don't expect wa large 
performance improvement of using a bloom filter in this scenario where the 
build side is small? Lookup operations in a table shouldn't be slow (especially 
a simple "contains").
   
   I think one use case might actually be not for small tables, but for larger 
build sides (which don't fit in CPU cache anymore), where the bloom filter can 
compress the size of the table so filtering actually *might* be faster than a 
hash table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to