brumi1024 commented on PR #7028:
URL: https://github.com/apache/hadoop/pull/7028#issuecomment-2333429490

   @slfan1989 there is an issue with the current implementation: we catch every 
PrivilegedOperationException - including the ones caused by a user-requested 
application kill - and then proceed to mark the NM unhealthy. This should not 
happen. Actually bit down in this class there is a separate exit code handling 
method for the container launch, which throws a config related exception in 
cases where the error is truly unrecoverable without admin input, I plan to 
reuse here as well. 
   
   But I'll only have time to work on that next week, until then I think this 
state is harmful, as after a few applications kills most of the NMs will be 
marked unhealthy, requiring a restart.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to