On March 27, 2020 5:06:16 PM GMT+02:00, "Wood, Randall"
wrote:
>I have a three node Ovirt cluster where one node has stale-data for the
>hosted engine, but the other two nodes do not:
>
>Output of `hosted-engine --vm-status` on a good node:
>
>```
>
>
>!! Cluster is in GLOBAL MAINTENANCE mode !!
>
>
>
>--== Host ovirt2.low.mdds.tcs-sec.com (id: 1) status ==--
>
>conf_on_shared_storage : True
>Status up-to-date : True
>Hostname : ovirt2.low.mdds.tcs-sec.com
>Host ID : 1
>Engine status : {"health": "good", "vm": "up",
>"detail": "Up"}
>Score : 3400
>stopped : False
>Local maintenance : False
>crc32 : f91f57e4
>local_conf_timestamp : 9915242
>Host timestamp : 9915241
>Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=9915241 (Fri Mar 27 14:38:14 2020)
> host-id=1
> score=3400
> vm_conf_refresh_time=9915242 (Fri Mar 27 14:38:14 2020)
> conf_on_shared_storage=True
> maintenance=False
> state=GlobalMaintenance
> stopped=False
>
>
>--== Host ovirt1.low.mdds.tcs-sec.com (id: 2) status ==--
>
>conf_on_shared_storage : True
>Status up-to-date : True
>Hostname : ovirt1.low.mdds.tcs-sec.com
>Host ID : 2
>Engine status : {"reason": "vm not running on this
>host", "health": "bad", "vm": "down", "detail": "unknown"}
>Score : 3400
>stopped : False
>Local maintenance : False
>crc32 : 48f9c0fc
>local_conf_timestamp : 9218845
>Host timestamp : 9218845
>Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=9218845 (Fri Mar 27 14:38:22 2020)
> host-id=2
> score=3400
> vm_conf_refresh_time=9218845 (Fri Mar 27 14:38:22 2020)
> conf_on_shared_storage=True
> maintenance=False
> state=GlobalMaintenance
> stopped=False
>
>
>--== Host ovirt3.low.mdds.tcs-sec.com (id: 3) status ==--
>
>conf_on_shared_storage : True
>Status up-to-date : False
>Hostname : ovirt3.low.mdds.tcs-sec.com
>Host ID : 3
>Engine status : unknown stale-data
>Score : 3400
>stopped : False
>Local maintenance : False
>crc32 : 620c8566
>local_conf_timestamp : 1208310
>Host timestamp : 1208310
>Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=1208310 (Mon Dec 16 21:14:24 2019)
> host-id=3
> score=3400
> vm_conf_refresh_time=1208310 (Mon Dec 16 21:14:24 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=GlobalMaintenance
> stopped=False
>
>
>!! Cluster is in GLOBAL MAINTENANCE mode !!
>
>```
>
>I tried the steps in https://access.redhat.com/discussions/3511881, but
>`hosted-engine --vm-status` on the node with stale data shows:
>
>```
>The hosted engine configuration has not been retrieved from shared
>storage. Please ensure that ovirt-ha-agent is running and the storage
>server is reachable.
>```
>
>One the stale node, ovirt-ha-agent and ovirt-ha-broker are continually
>restarting. Since it seems the agent depends on the broker, the broker
>logs includes this snippet, repeating roughly every 3 seconds:
>
>```
>MainThread::INFO::2020-03-27
>15:01:06,584::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>ovirt-hosted-engine-ha broker 2.3.6 started
>MainThread::INFO::2020-03-27
>15:01:06,584::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Searching for submonitors in
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor engine-health
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor storage-domain
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor network
>MainThread::INFO::2020-03-27
>15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load-no-engine
>MainThread::INFO::2020-03-27