Andy Grove created ARROW-11094:
----------------------------------

             Summary: [Rust] [DataFusion] Implement Sort-Merge Join
                 Key: ARROW-11094
                 URL: https://issues.apache.org/jira/browse/ARROW-11094
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Rust - DataFusion
            Reporter: Andy Grove
             Fix For: 4.0.0


The current hash join works well when one side of the join can be loaded into 
memory but cannot scale beyond the available RAM.

The advantage of implementing SMJ (Sort-Merge Join) is that we can sort the 
left and right partitions in parallel and then stream both sides of the join by 
merging these sorted partitions and we do not need to load one side into 
memory. At most, we need to load all batches from both sides that contain the 
current join key values.

https://en.wikipedia.org/wiki/Sort-merge_join



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to