I’m not sure this is related to memory management – the shuffle is the central 
act of moving data around nodes when the computations need the data on another 
node (E.g. Group by, sort, etc)

Shuffle read and shuffle write should be mirrored on the left/right side of a 
shuffle between 2 stages.

-adrian

From: Kartik Mathur
Date: Thursday, October 1, 2015 at 10:36 PM
To: user
Subject: Shuffle Write v/s Shuffle Read

Hi

I am trying to better understand shuffle in spark .

Based on my understanding thus far ,

Shuffle Write : writes stage output for intermediate stage on local disk if 
memory is not sufficient.,
Example , if each worker has 200 MB memory for intermediate results and the 
results are 300MB then , each executer will keep 200 MB in memory and will 
write remaining 100 MB on local disk .

Shuffle Read : Each executer will read from other executer's memory + disk , so 
total read in above case will be 300(200 from memory and 100 from disk)*num of 
executers ?

Is my understanding correct ?

Thanks,
Kartik

Reply via email to