brumi1024 commented on PR #7028: URL: https://github.com/apache/hadoop/pull/7028#issuecomment-2333429490
@slfan1989 there is an issue with the current implementation: we catch every PrivilegedOperationException - including the ones caused by a user-requested application kill - and then proceed to mark the NM unhealthy. This should not happen. Actually bit down in this class there is a separate exit code handling method for the container launch, which throws a config related exception in cases where the error is truly unrecoverable without admin input, I plan to reuse here as well. But I'll only have time to work on that next week, until then I think this state is harmful, as after a few applications kills most of the NMs will be marked unhealthy, requiring a restart. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org