Re: [I] [C++][Python] Pyarrow.Table.join() breaks on large tables v.18.0.0.dev486 [arrow]

via GitHub Fri, 17 Jan 2025 01:40:15 -0800


zanmato1984 commented on issue #44513:
URL: https://github.com/apache/arrow/issues/44513#issuecomment-2597823673


   Ah, no worry, non-taken :)
   
   I can elaborate a bit about the implementation details. The `join` operation 
is implemented by "hash join" algorithm, that is, 1) "build" a hash table using 
the table from one side of the join (the "build side"), 2) lookup the hash 
table for the table from the other side (the "probe side"). We always choose 
the right side as the build side, that is, `big` in `small.join(big, "left 
outer")`, or `small` in `big.join(small, "right outer")`. Using a big table to 
build the hash table is inefficient, and our implementation is error-prone, so 
switching the sides (and the join type) makes things totally different.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++][Python] Pyarrow.Table.join() breaks on large tables v.18.0.0.dev486 [arrow]

Reply via email to