[ovirt-users] Re: Engine status : unknown stale-data on single node

2020-04-01 Thread Randall Wood
> As  far  as  I know you need to first stop all ovirt-ha-broker.service  and
> ovirt-ha-agent.service (on all nodes) before  reinitializing ...
> I'm glad  that everything is back online.

Having that spelled out clarifies how I should have done this. Thanks for that 
tip (its now in my notes).
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YBBYZE7Z4MM3HNRNMRPZWC4NYHSQZMQY/


[ovirt-users] Re: Engine status : unknown stale-data on single node

2020-03-30 Thread Strahil Nikolov
On March 31, 2020 12:25:34 AM GMT+03:00, Randall Wood  
wrote:
>Thank you. The links were the same (present and pointing to the same
>location) on all three nodes (I had already looked through the
>possibility of a split brain following some other suggestions in older
>emails on this list).
>
>Sometime late Friday I found a message that suggested I should place
>the cluster into global maintenance mode and run `hosted-engine
>--reinitialize-lockspace`. After doing that it seems to have recovered
>(I presume that command removed/regenerated/fixed the link).
>___
>Users mailing list -- users@ovirt.org
>To unsubscribe send an email to users-le...@ovirt.org
>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>oVirt Code of Conduct:
>https://www.ovirt.org/community/about/community-guidelines/
>List Archives:
>https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTYSALEHGAEZTQQLZZIGOLAU4EM66U2S/

Hey Randall,

As  far  as  I know you need to first stop all ovirt-ha-broker.service  and 
ovirt-ha-agent.service (on all nodes) before  reinitializing ...
I'm glad  that everything is back online.

Best Regards,
Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VH2THQBBW523C2DW6ERZII7W2EBJ5HNN/


[ovirt-users] Re: Engine status : unknown stale-data on single node

2020-03-30 Thread Randall Wood
Thank you. The links were the same (present and pointing to the same location) 
on all three nodes (I had already looked through the possibility of a split 
brain following some other suggestions in older emails on this list).

Sometime late Friday I found a message that suggested I should place the 
cluster into global maintenance mode and run `hosted-engine 
--reinitialize-lockspace`. After doing that it seems to have recovered (I 
presume that command removed/regenerated/fixed the link).
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTYSALEHGAEZTQQLZZIGOLAU4EM66U2S/


[ovirt-users] Re: Engine status : unknown stale-data on single node

2020-03-29 Thread Strahil Nikolov
On March 27, 2020 5:06:16 PM GMT+02:00, "Wood, Randall"  
wrote:
>I have a three node Ovirt cluster where one node has stale-data for the
>hosted engine, but the other two nodes do not:
>
>Output of `hosted-engine --vm-status` on a good node:
>
>```
>
>
>!! Cluster is in GLOBAL MAINTENANCE mode !!
>
>
>
>--== Host ovirt2.low.mdds.tcs-sec.com (id: 1) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : True
>Hostname                           : ovirt2.low.mdds.tcs-sec.com
>Host ID                            : 1
>Engine status                      : {"health": "good", "vm": "up",
>"detail": "Up"}
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : f91f57e4
>local_conf_timestamp               : 9915242
>Host timestamp                     : 9915241
>Extra metadata (valid at timestamp):
>   metadata_parse_version=1
>   metadata_feature_version=1
>   timestamp=9915241 (Fri Mar 27 14:38:14 2020)
>   host-id=1
>   score=3400
>   vm_conf_refresh_time=9915242 (Fri Mar 27 14:38:14 2020)
>   conf_on_shared_storage=True
>   maintenance=False
>   state=GlobalMaintenance
>   stopped=False
>
>
>--== Host ovirt1.low.mdds.tcs-sec.com (id: 2) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : True
>Hostname                           : ovirt1.low.mdds.tcs-sec.com
>Host ID                            : 2
>Engine status                      : {"reason": "vm not running on this
>host", "health": "bad", "vm": "down", "detail": "unknown"}
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : 48f9c0fc
>local_conf_timestamp               : 9218845
>Host timestamp                     : 9218845
>Extra metadata (valid at timestamp):
>   metadata_parse_version=1
>   metadata_feature_version=1
>   timestamp=9218845 (Fri Mar 27 14:38:22 2020)
>   host-id=2
>   score=3400
>   vm_conf_refresh_time=9218845 (Fri Mar 27 14:38:22 2020)
>   conf_on_shared_storage=True
>   maintenance=False
>   state=GlobalMaintenance
>   stopped=False
>
>
>--== Host ovirt3.low.mdds.tcs-sec.com (id: 3) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : False
>Hostname                           : ovirt3.low.mdds.tcs-sec.com
>Host ID                            : 3
>Engine status                      : unknown stale-data
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : 620c8566
>local_conf_timestamp               : 1208310
>Host timestamp                     : 1208310
>Extra metadata (valid at timestamp):
>   metadata_parse_version=1
>   metadata_feature_version=1
>   timestamp=1208310 (Mon Dec 16 21:14:24 2019)
>   host-id=3
>   score=3400
>   vm_conf_refresh_time=1208310 (Mon Dec 16 21:14:24 2019)
>   conf_on_shared_storage=True
>   maintenance=False
>   state=GlobalMaintenance
>   stopped=False
>
>
>!! Cluster is in GLOBAL MAINTENANCE mode !!
>
>```
>
>I tried the steps in https://access.redhat.com/discussions/3511881, but
>`hosted-engine --vm-status` on the node with stale data shows:
>
>```
>The hosted engine configuration has not been retrieved from shared
>storage. Please ensure that ovirt-ha-agent is running and the storage
>server is reachable.
>```
>
>One the stale node, ovirt-ha-agent and ovirt-ha-broker are continually
>restarting. Since it seems the agent depends on the broker, the broker
>logs includes this snippet, repeating roughly every 3 seconds:
>
>```
>MainThread::INFO::2020-03-27
>15:01:06,584::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>ovirt-hosted-engine-ha broker 2.3.6 started
>MainThread::INFO::2020-03-27
>15:01:06,584::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Searching for submonitors in
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor engine-health
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor storage-domain
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor network
>MainThread::INFO::2020-03-27
>15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load-no-engine
>MainThread::INFO::2020-03-27