On an auto-scaling cluster using YARN as resource manager, we observed that
when we decrease the number of worker nodes after upscaling instance types,
the number of tasks for the same spark job spikes. (the total cpu/memory
capacity of the cluster remains identical)

the same spark job, with the same spark settings (dynamic allocation is
on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more
executors being allocated.

As far as I understand, dynamic allocation decides to start a new executor
if it sees tasks pending being queued up. But I don't know why the same
spark application with identical input files runs 4-5 times higher number
of tasks.

Any clues would be much appreciated, thank you.

Murat

Reply via email to