Ming Ma created HADOOP-11000: -------------------------------- Summary: HAServiceProtocol's health state is incorrectly transitioned to SERVICE_NOT_RESPONDING Key: HADOOP-11000 URL: https://issues.apache.org/jira/browse/HADOOP-11000 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma
When HAServiceProtocol.monitorHealth throws a HealthCheckFailedException, the actual exception from protocol buffer RPC is a RemoteException that wraps the real exception. Thus the state is incorrectly transitioned to SERVICE_NOT_RESPONDING {noformat} HealthMonitor.java doHealthChecks try { status = proxy.getServiceStatus(); proxy.monitorHealth(); healthy = true; } catch (HealthCheckFailedException e) { ..... enterState(State.SERVICE_UNHEALTHY); } catch (Throwable t) { ..... enterState(State.SERVICE_NOT_RESPONDING); ..... } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)