We have a DNS installation that has a HA-Logic, that may fail for say 10
seconds.

In such a case we experience the following:

* DNS goes down
* The Master gets this: "Received report from unknown server -- telling
it to MSG_CALL_SERVER_STARTUP" (Probably the IP is "unknown")
* The Regionservers do as directed, zookeeper logs state that /hbase/rs/
nodes are updated
* DNS goes up

Now there is no or a wrong master selection and no region can be served
anymore. Also, no other MSG_CALL_SERVER_STARTUP appear, which could
reanimate the cluster...

We use host names in the regionservers file.

What could we change to be more robust against such a problem?

Thx,

   Al

Reply via email to