Hi

I am trying to better understand shuffle in spark .

Based on my understanding thus far ,

*Shuffle Write* : writes stage output for intermediate stage on local disk
if memory is not sufficient.,
Example , if each worker has 200 MB memory for intermediate results and the
results are 300MB then , each executer* will keep 200 MB in memory and will
write remaining 100 MB on local disk .  *

*Shuffle Read : *Each executer will read from other executer's *memory +
disk , so total read in above case will be 300(200 from memory and 100 from
disk)*num of executers ? *

Is my understanding correct ?

Thanks,
Kartik

Reply via email to