[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...

ueshin Thu, 03 Apr 2014 00:40:35 -0700

Github user ueshin commented on the pull request:

    https://github.com/apache/spark/pull/283#issuecomment-39421176
  
    @mridulm Thank you for your reply.
    
    There are 2 points I have to mention about memory:
    
    - Before shuffle  
    If data are sorted, no more memory is needed because no sort operation is 
needed, and if not sorted, merge join needs some amount of memory to sort data 
in each partition.
    - After shuffle  
    Merge join needs at most the same amount of memory as hash join while 
fetching data, but it does not need more memory because it can produce output 
immediately from input. Hash join needs some more memory to build a hash table.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...

Reply via email to