On an auto-scaling cluster using YARN as resource manager, we observed that when we decrease the number of worker nodes after upscaling instance types, the number of tasks for the same spark job spikes. (the total cpu/memory capacity of the cluster remains identical)
the same spark job, with the same spark settings (dynamic allocation is on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more executors being allocated. As far as I understand, dynamic allocation decides to start a new executor if it sees tasks pending being queued up. But I don't know why the same spark application with identical input files runs 4-5 times higher number of tasks. Any clues would be much appreciated, thank you. Murat