On Wed, Feb 22, 2017 at 3:22 PM, Michal Skrivanek < michal.skriva...@redhat.com> wrote:
> > On 22 Feb 2017, at 13:53, Simone Tiraboschi <stira...@redhat.com> wrote: > > > > On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stira...@redhat.com> > wrote: > >> When ovirt-ha-agent checks the status of the engine VM we get: >> >> 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats >> error=Virtual machine does not exist: {'vmId': >> u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in >> method >> ret = func(*args, **kwargs) >> File "/usr/share/vdsm/API.py", line 335, in getStats >> vm = self.vm >> File "/usr/share/vdsm/API.py", line 130, in vm >> raise exception.NoSuchVM(vmId=self._UUID) >> NoSuchVM: Virtual machine does not exist: {'vmId': >> u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} >> >> >> While in ovirt-ha-agent logs we have: >> >> MainThread::INFO::2017-02-21 >> 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Current state UnknownLocalVmState (score: 3400) >> >> ... >> >> MainThread::INFO::2017-02-21 >> 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) >> Unknown local engine vm status no actions taken >> >> Probably it's a bug or a regression somewhere on master. >> > > On ovirt-ha-broker side the detection is based on a strict string match on > the error message that is expected to be exactly 'Virtual machine does not > exist' to set down status otherwise we set unknown status as in this case: > https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine- > ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/ > engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/ > master#l54 > > Adding Francesco here to understand if something has recently changed > there on vdsm side. > > > That’s not a very robust code handling. > Yes, the text changed, the vm id was added. > And yes, it may change again any time I guess > I agree, we are going to move to code check: https://gerrit.ovirt.org/#/c/72891 > > > >> >> On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbona...@redhat.com> >> wrote: >> >>> Adding Lev >>> >>> On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sab...@redhat.com> wrote: >>> >>>> Hi all, >>>> >>>> On the HC setup, the HE VM is not restarted. >>>> The agent.log has >>>> MainThread::INFO::2017-02-21 >>>> 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) >>>> Global metadata: {} >>>> MainThread::INFO::2017-02-21 >>>> 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) >>>> Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', >>>> 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': >>>> True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, >>>> 'gateway': True} >>>> ... >>>> MainThread::INFO::2017-02-21 >>>> 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) >>>> Unknown local engine vm status no actions taken >>>> MainThread::INFO::2017-02-21 >>>> 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>> Trying: notify time=1487733029.22 type=state_transition >>>> detail=ReinitializeFSM-UnknownLocalVmState >>>> hostname='lago-hc-basic-suite-master-host0' >>>> MainThread::INFO::2017-02-21 >>>> 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>> Success, was notification of state_transition >>>> (ReinitializeFSM-UnknownLocalVmState) sent? ignored >>>> >>>> and the vdsm.log >>>> >>>> 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] >>>> (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User >>>> shut down from within the guest (code=7) (vm:1269) >>>> 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] >>>> (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection >>>> (guestagent:429) >>>> >>>> 2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats >>>> error=Virtual machine does not exist: {'vmId': >>>> u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) >>>> Traceback (most recent call last): >>>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in >>>> method >>>> ret = func(*args, **kwargs) >>>> File "/usr/share/vdsm/API.py", line 335, in getStats >>>> vm = self.vm >>>> File "/usr/share/vdsm/API.py", line 130, in vm >>>> raise exception.NoSuchVM(vmId=self._UUID) >>>> NoSuchVM: Virtual machine does not exist: {'vmId': >>>> u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} >>>> >>>> >>>> What should I be looking for to identify the issue? >>>> >>>> The logs are at >>>> http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild/artifact/exported-artifacts/test_logs/hc-basic-suite-master/post-002_bootstrap.py/lago-hc-basic-suite-master-host0 >>>> >>>> thanks >>>> >>>> sahina >>>> >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> >>> -- >>> Sandro Bonazzola >>> Better technology. Faster innovation. Powered by community collaboration. >>> See how it works at redhat.com >>> >> >> > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > > >
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel