On 11/01/2014 07:43 AM, Jaicel wrote:
Hi,

my engine runs on Host1. current status and agent logs below.

Host 1

Hi,
it seems like you ran into [1], you can either zero-out the metadata file or apply the patch from [1] manually.

--Jirka

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925


MainThread::INFO::2014-10-31 
16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engi
ne-ha agent 1.1.6 started
MainThread::INFO::2014-10-31 
16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_get_hostname) Found certificate common name: 192.168.12.11
MainThread::INFO::2014-10-31 
16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2014-10-31 
16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor ping, options {'addr': '192.168.12.254'}
MainThread::INFO::2014-10-31 
16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634215107920
MainThread::INFO::2014-10-31 
16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 
'ovirtmgmt', 'address': '0'}
MainThread::INFO::2014-10-31 
16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634215108432
MainThread::INFO::2014-10-31 
16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
MainThread::INFO::2014-10-31 
16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 39956688
MainThread::INFO::2014-10-31 
16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 
'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f
9', 'address': '0'}
MainThread::INFO::2014-10-31 
16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634215107664
MainThread::INFO::2014-10-31 
16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
'41d4aff1-54e1-4946-a812-2e656bb7d3f9', '
address': '0'}
MainThread::INFO::2014-10-31 
16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634006879632
MainThread::INFO::2014-10-31 
16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Broker initialized, all submonitors started
MainThread::INFO::2014-10-31 
16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is 
acquired (file: /rhev/data-center/mnt/g
luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
MainThread::INFO::2014-10-31 
16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Global metadata: {'maintenance': False}
MainThread::INFO::2014-10-31 
16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra': 
'metadata_parse_version=1\nmetadata_feature_version
=1\ntimestamp=1413882675 (Tue Oct 21 17:11:15 
2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 
'hostname': '192.168.12.12', 'host-id': 2, 'engine-status': {'reason': 'vm not 
running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 
'score': 2400, 'maintenance': False, 'host-ts': 1413882675}
MainThread::INFO::2014-10-31 
16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
 Local (id 1): {'engine-health': None, 'bridge': True, 'mem-free': None, 
'maintenance': False, 'cpu-load': None, 'gateway': True}
MainThread::INFO::2014-10-31 
16:55:40,323::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1414745740.32 type=state_transition 
detail=StartState-ReinitializeFSM hostname='ovirt1'
MainThread::INFO::2014-10-31 
16:55:40,392::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (StartState-ReinitializeFSM) 
sent? ignored
MainThread::INFO::2014-10-31 
16:55:40,675::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state ReinitializeFSM (score: 0)
MainThread::INFO::2014-10-31 
16:55:50,710::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745750.71 type=state_transition 
detail=ReinitializeFSM-EngineUp hostname='ovirt1'
MainThread::INFO::2014-10-31 
16:55:50,710::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (ReinitializeFSM-EngineUp) sent? 
ignored
MainThread::INFO::2014-10-31 
16:55:51,001::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineUp (score: 2400)
MainThread::CRITICAL::2014-10-31 
16:56:01,033::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could 
not start ha-agent
Traceback (most recent call last):
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
97, in run
     self._run_agent()
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
154, in _run_agent
     hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 307, in start_monitoring
     for old_state, state, delay in self.fsm:
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", 
line 125, in next
     new_data = self.refresh(self._state.data)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 77, in refresh
     stats.update(self.hosted_engine.collect_stats())
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 700, in collect_stats
     stats = self.process_remote_metadata(host_id, remote_data)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 747, in process_remote_metadata
     md['engine-status'] = engine_status(md["engine-status"])
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 79, in engine_status
     in json.loads(status).iteritems()])
AttributeError: 'NoneType' object has no attribute 'iteritems'
[root@ovirt1 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : False
Hostname                           : 192.168.12.11
Host ID                            : 1
Engine status                      : unknown stale-data
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1414745750
Extra metadata (valid at timestamp):
         metadata_parse_version=1
         metadata_feature_version=1
         timestamp=1414745750 (Fri Oct 31 16:55:50 2014)
         host-id=1
         score=2400
         maintenance=False
         state=EngineUp


--== Host 2 status ==--

Status up-to-date                  : False
Hostname                           : 192.168.12.12
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1414745821
Extra metadata (valid at timestamp):
         metadata_parse_version=1
         metadata_feature_version=1
         timestamp=1414745821 (Fri Oct 31 16:57:01 2014)
         host-id=2
         score=2400
         maintenance=False
         state=EngineStart
[root@ovirt1 ~]# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked

Host2

MainThread::INFO::2014-10-31 
16:55:59,642::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engi
ne-ha agent 1.1.6 started
MainThread::INFO::2014-10-31 
16:55:59,678::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_get_hostname) Found certificate common name: 192.168.12.12
MainThread::INFO::2014-10-31 
16:55:59,918::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2014-10-31 
16:55:59,919::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor ping, options {'addr': '192.168.12.254'}
MainThread::INFO::2014-10-31 
16:55:59,922::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 25353488
MainThread::INFO::2014-10-31 
16:55:59,922::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 
'ovirtmgmt', 'address': '0'}
MainThread::INFO::2014-10-31 
16:55:59,928::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 25354128
MainThread::INFO::2014-10-31 
16:55:59,928::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
MainThread::INFO::2014-10-31 
16:55:59,931::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 25353552
MainThread::INFO::2014-10-31 
16:55:59,931::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 
'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f
9', 'address': '0'}
MainThread::INFO::2014-10-31 
16:55:59,934::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 139976608389584
MainThread::INFO::2014-10-31 
16:55:59,934::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
'41d4aff1-54e1-4946-a812-2e656bb7d3f9', '
address': '0'}
MainThread::INFO::2014-10-31 
16:55:59,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 139976608447760
MainThread::INFO::2014-10-31 
16:55:59,939::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Broker initialized, all submonitors started
MainThread::INFO::2014-10-31 
16:55:59,983::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 2 is 
acquired (file: /rhev/data-center/mnt/g
luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
MainThread::INFO::2014-10-31 
16:56:00,001::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Global metadata: {'maintenance': False}
MainThread::INFO::2014-10-31 
16:56:00,001::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Host 192.168.12.11 (id 1): {'live-data': True, 'extra': 
'metadata_parse_version=1\nmetadata_feature_version=
1\ntimestamp=1414745750 (Fri Oct 31 16:55:50 
2014)\nhost-id=1\nscore=2400\nmaintenance=False\nstate=EngineUp\n', 'hostn
ame': '192.168.12.11', 'host-id': 1, 'engine-status': {'health': 'good', 'vm': 
'up', 'detail': 'up'}, 'score': 2400, 'm
aintenance': False, 'host-ts': 1414745750}
MainThread::INFO::2014-10-31 
16:56:00,001::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Local (id 2): {'engine-health': None, 'bridge': True, 'mem-free': 
None, 'maintenance': False, 'cpu-load': No
ne, 'gateway': True}
MainThread::INFO::2014-10-31 
16:56:00,002::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745760.0 type=state_transition 
detail=StartState-ReinitializeFSM hostname='ovirt2'
MainThread::INFO::2014-10-31 
16:56:00,045::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (StartState-ReinitializeFSM) 
sent? ignored
MainThread::INFO::2014-10-31 
16:56:00,325::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(start_monitoring) Current state ReinitializeFSM (score: 0)
MainThread::INFO::2014-10-31 
16:56:10,352::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1414745770.35 type=state_transition 
detail=ReinitializeFSM-EngineDown hostname='ovirt2'
MainThread::INFO::2014-10-31 
16:56:10,353::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (ReinitializeFSM-EngineDown) 
sent? ignored
MainThread::INFO::2014-10-31 
16:56:10,638::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31 
16:56:20,663::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 The engine is not running, but we do not have enough data to decide which 
hosts are alive
MainThread::INFO::2014-10-31 
16:56:20,663::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1414745780.66 type=state_transition 
detail=EngineDown-EngineDown hostname='ovirt2'
MainThread::INFO::2014-10-31 
16:56:20,664::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineDown-EngineDown) sent? 
ignored
MainThread::INFO::2014-10-31 
16:56:20,943::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31 
16:56:30,968::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 The engine is not running, but we do not have enough data to decide which 
hosts are alive
MainThread::INFO::2014-10-31 
16:56:30,969::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1414745790.97 type=state_transition 
detail=EngineDown-EngineDown hostname='ovirt2'
MainThread::INFO::2014-10-31 
16:56:30,969::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineDown-EngineDown) sent? 
ignored
MainThread::INFO::2014-10-31 
16:56:31,248::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31 
16:56:41,274::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 The engine is not running, but we do not have enough data to decide which 
hosts are alive
MainThread::INFO::2014-10-31 
16:56:41,275::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1414745801.28 type=state_transition 
detail=EngineDown-EngineDown hostname='ovirt2'
MainThread::INFO::2014-10-31 
16:56:41,276::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineDown-EngineDown) sent? 
ignored
MainThread::INFO::2014-10-31 
16:56:41,555::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31 
16:56:51,583::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 The engine is not running, but we do not have enough data to decide which 
hosts are alive
MainThread::INFO::2014-10-31 
16:56:51,584::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1414745811.58 type=state_transition 
detail=EngineDown-EngineDown hostname='ovirt2'
MainThread::INFO::2014-10-31 
16:56:51,584::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineDown-EngineDown) sent? 
ignored
MainThread::INFO::2014-10-31 
16:56:51,864::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31 
16:57:01,897::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 Engine down and local host has best score (2400), attempting to start engine VM
MainThread::INFO::2014-10-31 
16:57:01,898::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1414745821.9 type=state_transition 
detail=EngineDown-EngineStart hostname='ovirt2'
MainThread::INFO::2014-10-31 
16:57:01,906::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineDown-EngineStart) sent? 
ignored
MainThread::INFO::2014-10-31 
16:57:02,189::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineStart (score: 2400)
MainThread::CRITICAL::2014-10-31 
16:57:02,207::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could 
not start ha-agent
Traceback (most recent call last):
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
97, in run
     self._run_agent()
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
154, in _run_agent
     hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 307, in start_monitoring
     for old_state, state, delay in self.fsm:
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", 
line 125, in next
     new_data = self.refresh(self._state.data)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 77, in refresh
     stats.update(self.hosted_engine.collect_stats())
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 662, in collect_stats
     constants.SERVICE_TYPE)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
     result = self._checked_communicate(request)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
     .format(message or response))
RequestError: Request failed: <type 'exceptions.OSError'>

[root@ovirt2 ~]# hosted-engine --vm-status
Traceback (most recent call last):
   File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
     "__main__", fname, loader, pkg_name)
   File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
     exec code in run_globals
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", 
line 111, in <module>
     if not status_checker.print_status():
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 
58, in print_status
     all_host_stats = ha_cli.get_all_host_stats()
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 137, in get_all_host_stats
     return self.get_all_stats(self.StatModes.HOST)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 86, in get_all_stats
     constants.SERVICE_TYPE)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
     result = self._checked_communicate(request)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
     .format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: <type 
'exceptions.OSError'>
[root@ovirt2 ~]# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked


Thanks,
Jaicel

----- Original Message -----
From: "Jiri Moskovcak" <jmosk...@redhat.com>
To: "Jaicel" <jai...@asti.dost.gov.ph>
Cc: "Niels de Vos" <nde...@redhat.com>, "Vijay Bellur" <vbel...@redhat.com>, us...@ovirt.org, 
"Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, October 31, 2014 11:05:32 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 10:26 AM, Jaicel wrote:
i've increased the limit and then restarted agent and broker. status normalize, but then right now 
it went to "False" state again but still both having 2400 score. agent logs remains the 
same, with "ovirt-ha-agent dead but subsys locked" status. ha-broker logs below

Thread-138::INFO::2014-10-31 
17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-138::INFO::2014-10-31 
17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-139::INFO::2014-10-31 
17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-139::INFO::2014-10-31 
17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-140::INFO::2014-10-31 
17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-140::INFO::2014-10-31 
17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-141::INFO::2014-10-31 
17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-141::INFO::2014-10-31 
17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-142::INFO::2014-10-31 
17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-142::INFO::2014-10-31 
17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel

ok, now it seems that broker runs fine, so I need the recent agent.log
to debug it more.

--Jirka


----- Original Message -----
From: "Jiri Moskovcak" <jmosk...@redhat.com>
To: "Jaicel R. Sabonsolin" <jai...@asti.dost.gov.ph>, "Niels de Vos" 
<nde...@redhat.com>
Cc: "Vijay Bellur" <vbel...@redhat.com>, us...@ovirt.org, "Gluster Devel" 
<gluster-devel@gluster.org>
Sent: Friday, October 31, 2014 4:32:02 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
Hi guys,

these logs appear on both hosts just like the result of --vm-status. tried to 
tcpdump on ovirt hosts and gluster nodes but only packets exchange with my 
monitoring VM(zabbix) appeared.

agent.log
       new_data = self.refresh(self._state.data)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 77, in refresh
       stats.update(self.hosted_engine.collect_stats())
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 662, in collect_stats
       constants.SERVICE_TYPE)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
       result = self._checked_communicate(request)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
       .format(message or response))
RequestError: Request failed: <type 'exceptions.OSError'>

broker.log
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 165, in handle
       response = "success " + self._dispatch(data)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 261, in _dispatch
       .get_all_stats_for_service_type(**options)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 41, in get_all_stats_for_service_type
       d = self.get_raw_stats_for_service_type(storage_dir, service_type)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 74, in get_raw_stats_for_service_type
       f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'

- ah, there we go ^^^^^^ you might need to tweak the limit of allowed
open files as described here [1] or find the app keeps so many files open


--Jirka

[1]
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

Thread-38160::INFO::2014-10-31 
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-38161::INFO::2014-10-31 
10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-38161::ERROR::2014-10-31 
10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Error handling request, data: 'get-stats 
storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent
 service_type=hosted-engine'
Traceback (most recent call last):
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 165, in handle
       response = "success " + self._dispatch(data)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 261, in _dispatch
       .get_all_stats_for_service_type(**options)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 41, in get_all_stats_for_service_type
       d = self.get_raw_stats_for_service_type(storage_dir, service_type)
     File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 74, in get_raw_stats_for_service_type
       f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
Thread-38161::INFO::2014-10-31 
10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel

----- Original Message -----
From: "Niels de Vos" <nde...@redhat.com>
To: "Vijay Bellur" <vbel...@redhat.com>
Cc: "Jiri Moskovcak" <jmosk...@redhat.com>, "Jaicel R. Sabonsolin" <jai...@asti.dost.gov.ph>, 
us...@ovirt.org, "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, October 31, 2014 4:11:25 AM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
Hi Guys,

I need help with my ovirt Hosted-Engine HA setup. I am running on 2
ovirt hosts and 2 gluster nodes with replicated volumes. i already have
VMs running on my hosts and they can migrate normally once i for example
power off the host that they are running on. the problem is that the
engine can't migrate once i switch off the host that hosts the engine.

      oVirt        3.4.3-1.el6
      KVM         0.12.1.2 - 2.415.el6_5.10
      LIBVIRT   libvirt-0.10.2-29.el6_5.9
      VDSM      vdsm-4.14.17-0.el6


right now, i have this result from hosted-engine --vm-status.

         File "/usr/lib64/python2.6/runpy.py", line 122, in
      _run_module_as_main
           "__main__", fname, loader, pkg_name)
         File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
           exec code in run_globals
         File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",

      line 111, in <module>
           if not status_checker.print_status():
         File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",

      line 58, in print_status
           all_host_stats = ha_cli.get_all_host_stats()
         File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",

      line 137, in get_all_host_stats
           return self.get_all_stats(self.StatModes.HOST)
         File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",

      line 86, in get_all_stats
           constants.SERVICE_TYPE)
         File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",

      line 171, in get_stats_from_storage
           result = self._checked_communicate(request)
         File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",

      line 199, in _checked_communicate
           .format(message or response))
      ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
      <type 'exceptions.OSError'>


restarting ha-broker and ha-agent normalizes the status but eventually
it would become "false" and then return to the result above. hope you
guys could help me with this.


Hi Jaicel,
please attach agent.log and broker.log from the host where you trying to
run hosted-engine --vm-status. I have a feeling that you ran into a
known problem on gluster - stalled file descriptor, in that case the
only known solution at this time is to restart the broker & agent as you
have already found out.


Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective.

I'd welcome any details on this "stalled file descriptor" problem. Is
there a bug filed with some details like logs, sysrq-t and maybe even
tcpdumps? If there is an easy way to reproduce this behaviour, I can
surely look into it and hopefully come up with some advise or fix.

Thanks,
Niels


_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Reply via email to