[ovirt-users] Re: HostedEngine VM Paused after power failure
Check libvirt logs on both hosts for clues.Something is wrong with the disks. Also check the status of the following services:- sanlock- vdsmd- supervdsm Verify that all VGs are activated properly (especially if you use gluster). Best Regards,Strahil Nikolov On Wed, Feb 10, 2021 at 4:02, Ian Easter wrote: ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/R7UKX52RCOTB2QVRKGLCWAAZJXA3IBBK/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z5OIFLQGNMF56GGJPGKLRFHIFUMHSZX5/
[ovirt-users] Re: HostedEngine VM Paused after power failure
Robert, I understand the sentiment of the difficulty here. The recovery feels brutal but the monolithic nature and the dense ecosystem is understandable for the purpose it serves. I am able to mount the raw disk image for the HostedEngine VM cleanly without any errors and it seems to check out, so I don't believe there is any corruption. Everything looks to operate as expected and then it just seems to snag somewhere through the startup. I suppose I'm just trying to trace down the hiccup to clear it out of the way and let the VM boot up. My knowledge is a bit limited digging in and troubleshooting the components here. Additional snippet: MainThread::INFO::2021-02-09 21:00:07,357::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': '74b3c839-c89c-4857-ada0-95715672348a'} failed: (code=1, message=Virtual machine does not exist: {'vmId': '74b3c839-c89c-4857-ada0-95715672348a'}) MainThread::INFO::2021-02-09 21:00:07,357::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2021-02-09 21:00:07,389::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored MainThread::INFO::2021-02-09 21:00:07,406::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2021-02-09 21:00:17,427::states::740::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Another host already took over.. *Thank you,* *Ian Easter* On Tue, Feb 9, 2021 at 6:31 PM Robert Tongue wrote: > I've seen this happen with the VM disk itself becoming corrupt. If you > try to read the contents of the file, and it gives you "Input/Output > Error", then it is not good news. I've been testing oVirt recently, and > these issues alone are preventing me from using it full time. I cannot > help further, unfortunately, as I have no idea how to fix it. So best I > can say is, hopefully someone else chimes in and helps both of us. > > -phunyguy > -- > *From:* ieas...@telvue.com > *Sent:* Tuesday, February 9, 2021 6:25 PM > *To:* users@ovirt.org > *Subject:* [ovirt-users] Re: HostedEngine VM Paused after power failure > > Attempting to resume or start the VM doesn't yield any results. > > Here is the status of the VM: > Host ID: 1 > Host timestamp : 115601 > Score : 3400 > Engine status : {"vm": "up", "health": "bad", > "detail": "Paused", "reason": "bad vm status"} > Hostname : > Local maintenance : False > stopped: False > crc32 : 68efbf40 > conf_on_shared_storage : True > local_conf_timestamp : 115601 > Status up-to-date : True > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=115601 (Tue Feb 9 18:25:48 2021) > host-id=1 > score=3400 > vm_conf_refresh_time=115601 (Tue Feb 9 18:25:48 2021) > conf_on_shared_storage=True > maintenance=False > state=EngineStarting > stopped=False > > > Here is a chunk in agent.log that is a bit perplexing. I'm not too sure > what it means that the VM doesn't exist. Storage is correctly mounted, > everything looks fully operational. I can see the HostedEngine disk > available to the Host. > > MainThread::INFO::2021-02-09 > 18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) > Current state EngineDown (score: 3400) > MainThread::INFO::2021-02-09 > 18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) > Engine down and local host has best score (3400), attempting to start > engine VM > MainThread::INFO::2021-02-09 > 18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) > Success, was notification of state_transition (EngineDown-EngineStart) > sent? ignored > MainThread::INFO::2021-02-09 > 18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) > Current state EngineStart (score: 3400) > MainThread::INFO::2021-02-09 > 18:08:24,000::hosted_engine::895::ovi
[ovirt-users] Re: HostedEngine VM Paused after power failure
I've seen this happen with the VM disk itself becoming corrupt. If you try to read the contents of the file, and it gives you "Input/Output Error", then it is not good news. I've been testing oVirt recently, and these issues alone are preventing me from using it full time. I cannot help further, unfortunately, as I have no idea how to fix it. So best I can say is, hopefully someone else chimes in and helps both of us. -phunyguy From: ieas...@telvue.com Sent: Tuesday, February 9, 2021 6:25 PM To: users@ovirt.org Subject: [ovirt-users] Re: HostedEngine VM Paused after power failure Attempting to resume or start the VM doesn't yield any results. Here is the status of the VM: Host ID: 1 Host timestamp : 115601 Score : 3400 Engine status : {"vm": "up", "health": "bad", "detail": "Paused", "reason": "bad vm status"} Hostname : Local maintenance : False stopped: False crc32 : 68efbf40 conf_on_shared_storage : True local_conf_timestamp : 115601 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=115601 (Tue Feb 9 18:25:48 2021) host-id=1 score=3400 vm_conf_refresh_time=115601 (Tue Feb 9 18:25:48 2021) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Here is a chunk in agent.log that is a bit perplexing. I'm not too sure what it means that the VM doesn't exist. Storage is correctly mounted, everything looks fully operational. I can see the HostedEngine disk available to the Host. MainThread::INFO::2021-02-09 18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400) MainThread::INFO::2021-02-09 18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2021-02-09 18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored MainThread::INFO::2021-02-09 18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400) MainThread::INFO::2021-02-09 18:08:24,000::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start` MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: VM in WaitForLaunch MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': '74b3c839-c89c-4857-ada0-95715672348a'} failed: (code=1, message=Virtual machine does not exist: {'vmId': '74b3c839-c89c-4857-ada0-95715672348a'}) MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2021-02-09 18:08:24,552::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored MainThread::INFO::2021-02-09 18:08:24,565::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2021-02-09 18:08:34,585::states::736::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. MainThread::INFO::2021-02-09 18:08:34,590::state_decorators::99::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Tue Feb 9 18:18:34 2021 while transitioning -> ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVi
[ovirt-users] Re: HostedEngine VM Paused after power failure
Attempting to resume or start the VM doesn't yield any results. Here is the status of the VM: Host ID: 1 Host timestamp : 115601 Score : 3400 Engine status : {"vm": "up", "health": "bad", "detail": "Paused", "reason": "bad vm status"} Hostname : Local maintenance : False stopped: False crc32 : 68efbf40 conf_on_shared_storage : True local_conf_timestamp : 115601 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=115601 (Tue Feb 9 18:25:48 2021) host-id=1 score=3400 vm_conf_refresh_time=115601 (Tue Feb 9 18:25:48 2021) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Here is a chunk in agent.log that is a bit perplexing. I'm not too sure what it means that the VM doesn't exist. Storage is correctly mounted, everything looks fully operational. I can see the HostedEngine disk available to the Host. MainThread::INFO::2021-02-09 18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400) MainThread::INFO::2021-02-09 18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2021-02-09 18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored MainThread::INFO::2021-02-09 18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400) MainThread::INFO::2021-02-09 18:08:24,000::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start` MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: VM in WaitForLaunch MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': '74b3c839-c89c-4857-ada0-95715672348a'} failed: (code=1, message=Virtual machine does not exist: {'vmId': '74b3c839-c89c-4857-ada0-95715672348a'}) MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2021-02-09 18:08:24,552::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored MainThread::INFO::2021-02-09 18:08:24,565::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2021-02-09 18:08:34,585::states::736::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. MainThread::INFO::2021-02-09 18:08:34,590::state_decorators::99::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Tue Feb 9 18:18:34 2021 while transitioning -> ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDKODQL5A4NNIWJMONVYTFIGC3256URS/
[ovirt-users] Re: HostedEngine VM Paused after power failure
If the enginve VM is in a paused state, ssh into host where its paused and try virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf resume HostedEngine On Tue, Feb 9, 2021 at 2:15 PM Ian Easter wrote: > Hello Users, > > We have an oVirt (4.4) environment that had 2 hosts in the cluster. We > suffered from a power failure that caused the servers to be offline for > some time. Once restored, one of the hosts from the cluster lost its OS > raid and is not accessible. > > The other server has the HostedEngine vm on it but in a paused state. I > have tried to manually start the vm with the hosted-engine CLI tool but it > indicates that HostedEngine is running on another host. > > Is there any manual intervention I can accomplish here to start the > HostedEngine on the second, active host server? > > *Thank you,* > *Ian Easter* > *DevOps Engineer* > *TelVue Support* > https://www.telvue.com/support/ > > > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/VFE52Q3ILURK2C6L3XN4TEL76NRTLIWD/ > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/P7JLB6QZNTMZ5LU57KG4Q6M3FCEZDNT5/