adriangb commented on issue #17171: URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3280509469
> Probe Phase > For each tuple in S, hash its join key and check to see whether there is a match for each tuple in corresponding bucket in the hash table constructed for R. If inputs were partitioned, then assign each thread a unique partition. Otherwise, synchronize their access to the cursor on S. Bloom Filter: Create a Bloom Filter during the build phase when the key is likely to not exist in the hash table [4]. Threads check the filter before probing the hash table. This will be faster since the filter will fit in CPU caches. Sometimes called sideways information passing. But fair enough yeah. I think the best way to figure this out is to cook up the implementation(s) and put them behind feature flags and have folks like @LiaCastaneda report their results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org