I've seen a few places where it's been mentioned that after a shuffle each
reducer needs to pull its partition into memory in its entirety. Is this
true? I'd assume the merge sort that needs to be done (in the cases where
sortByKey() is not used) wouldn't need to pull all of the data into memory
at once... is it the sort for the sortByKey() that requires this to be done?

Reply via email to