Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

via GitHub Sat, 20 Sep 2025 18:07:40 -0700


Dandandan commented on issue #17171:
URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3280230056


   For running DataFusion in a single node, what would be the benefit of a 
bloom filter vs just using the hash table as is? I don't expect wa large 
performance improvement of using a bloom filter in this scenario where the 
build side is small? Lookup operations in a table shouldn't be slow (especially 
a simple "contains").
   
   I think one use case might actually be not for small tables, but for larger 
build sides (which don't fit in CPU cache anymore), where the bloom filter can 
compress the size of the table so filtering actually *might* be faster than a 
hash table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

Reply via email to