Hi I am trying to better understand shuffle in spark .
Based on my understanding thus far , *Shuffle Write* : writes stage output for intermediate stage on local disk if memory is not sufficient., Example , if each worker has 200 MB memory for intermediate results and the results are 300MB then , each executer* will keep 200 MB in memory and will write remaining 100 MB on local disk . * *Shuffle Read : *Each executer will read from other executer's *memory + disk , so total read in above case will be 300(200 from memory and 100 from disk)*num of executers ? * Is my understanding correct ? Thanks, Kartik