[ovirt-users] Re: HostedEngine VM Paused after power failure

2021-02-09 Thread Strahil Nikolov via Users
Check libvirt logs on both hosts for clues.Something is wrong with the disks.
Also check the status of the following services:- sanlock- vdsmd- supervdsm
Verify that all VGs are activated properly (especially if you use gluster).
Best Regards,Strahil Nikolov
 
 
  On Wed, Feb 10, 2021 at 4:02, Ian Easter wrote:   
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/R7UKX52RCOTB2QVRKGLCWAAZJXA3IBBK/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z5OIFLQGNMF56GGJPGKLRFHIFUMHSZX5/


[ovirt-users] Re: HostedEngine VM Paused after power failure

2021-02-09 Thread Ian Easter
Robert,

I understand the sentiment of the difficulty here.  The recovery feels
brutal but the monolithic nature and the dense ecosystem is understandable
for the purpose it serves.

I am able to mount the raw disk image for the HostedEngine VM cleanly
without any errors and it seems to check out, so I don't believe there is
any corruption.

Everything looks to operate as expected and then it just seems to snag
somewhere through the startup.  I suppose I'm just trying to trace down the
hiccup to clear it out of the way and let the VM boot up.  My knowledge is
a bit limited digging in and troubleshooting the components here.

Additional snippet:
MainThread::INFO::2021-02-09
21:00:07,357::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
stderr: Command VM.getStats with args {'vmID':
'74b3c839-c89c-4857-ada0-95715672348a'} failed:
(code=1, message=Virtual machine does not exist: {'vmId':
'74b3c839-c89c-4857-ada0-95715672348a'})

MainThread::INFO::2021-02-09
21:00:07,357::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
Engine VM started on localhost
MainThread::INFO::2021-02-09
21:00:07,389::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineStart-EngineStarting)
sent? ignored
MainThread::INFO::2021-02-09
21:00:07,406::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStarting (score: 3400)
MainThread::INFO::2021-02-09
21:00:17,427::states::740::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Another host already took over..


*Thank you,*
*Ian Easter*





On Tue, Feb 9, 2021 at 6:31 PM Robert Tongue  wrote:

> I've seen this happen with the VM disk itself becoming corrupt.  If you
> try to read the contents of the file, and it gives you "Input/Output
> Error", then it is not good news.  I've been testing oVirt recently, and
> these issues alone are preventing me from using it full time.  I cannot
> help further, unfortunately, as I have no idea how to fix it.  So best I
> can say is, hopefully someone else chimes in and helps both of us.
>
> -phunyguy
> --
> *From:* ieas...@telvue.com 
> *Sent:* Tuesday, February 9, 2021 6:25 PM
> *To:* users@ovirt.org 
> *Subject:* [ovirt-users] Re: HostedEngine VM Paused after power failure
>
> Attempting to resume or start the VM doesn't yield any results.
>
> Here is the status of the VM:
> Host ID: 1
> Host timestamp : 115601
> Score  : 3400
> Engine status  : {"vm": "up", "health": "bad",
> "detail": "Paused", "reason": "bad vm status"}
> Hostname   :
> Local maintenance  : False
> stopped: False
> crc32  : 68efbf40
> conf_on_shared_storage : True
> local_conf_timestamp   : 115601
> Status up-to-date  : True
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=115601 (Tue Feb  9 18:25:48 2021)
> host-id=1
> score=3400
> vm_conf_refresh_time=115601 (Tue Feb  9 18:25:48 2021)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineStarting
> stopped=False
>
>
> Here is a chunk in agent.log that is a bit perplexing.  I'm not too sure
> what it means that the VM doesn't exist.  Storage is correctly mounted,
> everything looks fully operational.  I can see the HostedEngine disk
> available to the Host.
>
> MainThread::INFO::2021-02-09
> 18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineDown (score: 3400)
> MainThread::INFO::2021-02-09
> 18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine down and local host has best score (3400), attempting to start
> engine VM
> MainThread::INFO::2021-02-09
> 18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Success, was notification of state_transition (EngineDown-EngineStart)
> sent? ignored
> MainThread::INFO::2021-02-09
> 18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineStart (score: 3400)
> MainThread::INFO::2021-02-09
> 18:08:24,000::hosted_engine::895::ovi

[ovirt-users] Re: HostedEngine VM Paused after power failure

2021-02-09 Thread Robert Tongue
I've seen this happen with the VM disk itself becoming corrupt.  If you try to 
read the contents of the file, and it gives you "Input/Output Error", then it 
is not good news.  I've been testing oVirt recently, and these issues alone are 
preventing me from using it full time.  I cannot help further, unfortunately, 
as I have no idea how to fix it.  So best I can say is, hopefully someone else 
chimes in and helps both of us.

-phunyguy

From: ieas...@telvue.com 
Sent: Tuesday, February 9, 2021 6:25 PM
To: users@ovirt.org 
Subject: [ovirt-users] Re: HostedEngine VM Paused after power failure

Attempting to resume or start the VM doesn't yield any results.

Here is the status of the VM:
Host ID: 1
Host timestamp : 115601
Score  : 3400
Engine status  : {"vm": "up", "health": "bad", "detail": 
"Paused", "reason": "bad vm status"}
Hostname   :
Local maintenance  : False
stopped: False
crc32  : 68efbf40
conf_on_shared_storage : True
local_conf_timestamp   : 115601
Status up-to-date  : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=115601 (Tue Feb  9 18:25:48 2021)
host-id=1
score=3400
vm_conf_refresh_time=115601 (Tue Feb  9 18:25:48 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False


Here is a chunk in agent.log that is a bit perplexing.  I'm not too sure what 
it means that the VM doesn't exist.  Storage is correctly mounted, everything 
looks fully operational.  I can see the HostedEngine disk available to the Host.

MainThread::INFO::2021-02-09 
18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
 Current state EngineDown (score: 3400)
MainThread::INFO::2021-02-09 
18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 Engine down and local host has best score (3400), attempting to start engine VM
MainThread::INFO::2021-02-09 
18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineDown-EngineStart) sent? 
ignored
MainThread::INFO::2021-02-09 
18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
 Current state EngineStart (score: 3400)
MainThread::INFO::2021-02-09 
18:08:24,000::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
 Ensuring VDSM state is clear for engine VM
MainThread::INFO::2021-02-09 
18:08:24,005::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
 Vdsm state for VM clean
MainThread::INFO::2021-02-09 
18:08:24,005::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 Starting vm using `/usr/sbin/hosted-engine --vm-start`
MainThread::INFO::2021-02-09 
18:08:24,519::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 stdout: VM in WaitForLaunch

MainThread::INFO::2021-02-09 
18:08:24,519::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 stderr: Command VM.getStats with args {'vmID': 
'74b3c839-c89c-4857-ada0-95715672348a'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': 
'74b3c839-c89c-4857-ada0-95715672348a'})

MainThread::INFO::2021-02-09 
18:08:24,519::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 Engine VM started on localhost
MainThread::INFO::2021-02-09 
18:08:24,552::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineStart-EngineStarting) 
sent? ignored
MainThread::INFO::2021-02-09 
18:08:24,565::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
 Current state EngineStarting (score: 3400)
MainThread::INFO::2021-02-09 
18:08:34,585::states::736::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 VM is powering up..
MainThread::INFO::2021-02-09 
18:08:34,590::state_decorators::99::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
 Timeout set to Tue Feb  9 18:18:34 2021 while transitioning  -> 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVi

[ovirt-users] Re: HostedEngine VM Paused after power failure

2021-02-09 Thread ieaster
Attempting to resume or start the VM doesn't yield any results.

Here is the status of the VM:
Host ID: 1
Host timestamp : 115601
Score  : 3400
Engine status  : {"vm": "up", "health": "bad", "detail": 
"Paused", "reason": "bad vm status"}
Hostname   : 
Local maintenance  : False
stopped: False
crc32  : 68efbf40
conf_on_shared_storage : True
local_conf_timestamp   : 115601
Status up-to-date  : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=115601 (Tue Feb  9 18:25:48 2021)
host-id=1
score=3400
vm_conf_refresh_time=115601 (Tue Feb  9 18:25:48 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False


Here is a chunk in agent.log that is a bit perplexing.  I'm not too sure what 
it means that the VM doesn't exist.  Storage is correctly mounted, everything 
looks fully operational.  I can see the HostedEngine disk available to the Host.

MainThread::INFO::2021-02-09 
18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
 Current state EngineDown (score: 3400)
MainThread::INFO::2021-02-09 
18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 Engine down and local host has best score (3400), attempting to start engine VM
MainThread::INFO::2021-02-09 
18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineDown-EngineStart) sent? 
ignored
MainThread::INFO::2021-02-09 
18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
 Current state EngineStart (score: 3400)
MainThread::INFO::2021-02-09 
18:08:24,000::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
 Ensuring VDSM state is clear for engine VM
MainThread::INFO::2021-02-09 
18:08:24,005::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
 Vdsm state for VM clean
MainThread::INFO::2021-02-09 
18:08:24,005::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 Starting vm using `/usr/sbin/hosted-engine --vm-start`
MainThread::INFO::2021-02-09 
18:08:24,519::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 stdout: VM in WaitForLaunch

MainThread::INFO::2021-02-09 
18:08:24,519::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 stderr: Command VM.getStats with args {'vmID': 
'74b3c839-c89c-4857-ada0-95715672348a'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': 
'74b3c839-c89c-4857-ada0-95715672348a'})

MainThread::INFO::2021-02-09 
18:08:24,519::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
 Engine VM started on localhost
MainThread::INFO::2021-02-09 
18:08:24,552::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineStart-EngineStarting) 
sent? ignored
MainThread::INFO::2021-02-09 
18:08:24,565::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
 Current state EngineStarting (score: 3400)
MainThread::INFO::2021-02-09 
18:08:34,585::states::736::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 VM is powering up..
MainThread::INFO::2021-02-09 
18:08:34,590::state_decorators::99::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
 Timeout set to Tue Feb  9 18:18:34 2021 while transitioning  -> 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDKODQL5A4NNIWJMONVYTFIGC3256URS/


[ovirt-users] Re: HostedEngine VM Paused after power failure

2021-02-09 Thread Edward Berger
If the enginve VM is in a paused state, ssh into host where its paused and
try

virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf
resume HostedEngine


On Tue, Feb 9, 2021 at 2:15 PM Ian Easter  wrote:

> Hello Users,
>
> We have an oVirt (4.4) environment that had 2 hosts in the cluster.  We
> suffered from a power failure that caused the servers to be offline for
> some time.  Once restored, one of the hosts from the cluster lost its OS
> raid and is not accessible.
>
> The other server has the HostedEngine vm on it but in a paused state.  I
> have tried to manually start the vm with the hosted-engine CLI tool but it
> indicates that HostedEngine is running on another host.
>
> Is there any manual intervention I can accomplish here to start the
> HostedEngine on the second, active host server?
>
> *Thank you,*
> *Ian Easter*
> *DevOps Engineer*
> *TelVue Support*
> https://www.telvue.com/support/
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VFE52Q3ILURK2C6L3XN4TEL76NRTLIWD/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P7JLB6QZNTMZ5LU57KG4Q6M3FCEZDNT5/