Github user klion26 commented on the issue: https://github.com/apache/spark/pull/19145 We enabled RM and NM recovery. If we assume there are 2 containers running on this NM, after 10 minute, RM detects the failure of NM and relaunches 2 lost containers in other NMs. This is ok. But if we restart the RM, then, the lost containers in the NM will be **reported to RM as lost again** because of recovery, we will relaunch 2 more containers in other NMs, then we will get 2 more executors than we expected.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org