I want to understand how  spark takes care of data localisation in cluster
mode when run on YARN.

1.Driver program asks ResourceManager for executors. Does it tell yarn's RM
to check HDFS blocks of input data and then allocate executors to it.
And executors remain fixed throughout application or driver program asks
for new executors when it submits another job in same application , since
in spark new job is created for each action . If executors are fixed then
for second job achieving data localisation is impossible?



2.When executors are done with their processing, does they are marked as
free in ResourceManager's resoruce queue and  executors directly tell this
to Rm  instead of via driver's ?

Thanks
Shushant

Reply via email to