zanmato1984 commented on issue #44513: URL: https://github.com/apache/arrow/issues/44513#issuecomment-2597823673
Ah, no worry, non-taken :) I can elaborate a bit about the implementation details. The `join` operation is implemented by "hash join" algorithm, that is, 1) "build" a hash table using the table from one side of the join (the "build side"), 2) lookup the hash table for the table from the other side (the "probe side"). We always choose the right side as the build side, that is, `big` in `small.join(big, "left outer")`, or `small` in `big.join(small, "right outer")`. Using a big table to build the hash table is inefficient, and our implementation is error-prone, so switching the sides (and the join type) makes things totally different. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
