Github user klion26 commented on the issue:

    https://github.com/apache/spark/pull/19145
  
    We enabled RM and NM recovery.
    
    If we assume there are 2 containers running on this NM, after 10 minute, RM 
detects the failure of NM and relaunches 2 lost containers in other NMs. This 
is ok. 
    
    But if we restart the RM, then, the lost containers in the NM will be 
**reported to RM as lost again** because of recovery, we will relaunch 2 more 
containers in other NMs, then we will get 2 more executors than we expected.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to