alamb commented on issue #20195: URL: https://github.com/apache/datafusion/issues/20195#issuecomment-4164493836
> I guess the question depends on what the alternative is. As far as I can remember / think of the two alternatives would be to try every partition's hash table (poor build side performance and questionable probe side performance) or to make a combined hash table with all values (poor memory use). I think in certain circumstances you only have to try a single partition's hash table For example: 1. The join is an eqi-join (equality predicate) 2. The inputs have exactly the same hash (or range) partitioning on the columns used as join keys In that case there is exactly one partition's hash table where matching tuples would be (if they are there at all) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
