We have a DNS installation that has a HA-Logic, that may fail for say 10 seconds.
In such a case we experience the following: * DNS goes down * The Master gets this: "Received report from unknown server -- telling it to MSG_CALL_SERVER_STARTUP" (Probably the IP is "unknown") * The Regionservers do as directed, zookeeper logs state that /hbase/rs/ nodes are updated * DNS goes up Now there is no or a wrong master selection and no region can be served anymore. Also, no other MSG_CALL_SERVER_STARTUP appear, which could reanimate the cluster... We use host names in the regionservers file. What could we change to be more robust against such a problem? Thx, Al