Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-11-10 Thread Jaicel
Hi Jirka, 

the patch works. it stabilized the status of my two hosts. the engine migration 
during failover also works fine. thanks guys! 

Jaicel 


From: Jiri Moskovcak jmosk...@redhat.com 
To: Jaicel jai...@asti.dost.gov.ph 
Cc: Niels de Vos nde...@redhat.com, Vijay Bellur vbel...@redhat.com, 
us...@ovirt.org, Gluster Devel gluster-devel@gluster.org 
Sent: Monday, November 3, 2014 3:33:16 PM 
Subject: Re: [ovirt-users] Hosted-Engine HA problem 

On 11/01/2014 07:43 AM, Jaicel wrote: 
 Hi, 
 
 my engine runs on Host1. current status and agent logs below. 
 
 Host 1 

Hi, 
it seems like you ran into [1], you can either zero-out the metadata 
file or apply the patch from [1] manually. 

--Jirka 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925 

 
 MainThread::INFO::2014-10-31 
 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
 ovirt-hosted-engi 
 ne-ha agent 1.1.6 started 
 MainThread::INFO::2014-10-31 
 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
  
 :(_get_hostname) Found certificate common name: 192.168.12.11 
 MainThread::INFO::2014-10-31 
 16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
  
 :(_initialize_broker) Initializing ha-broker connection 
 MainThread::INFO::2014-10-31 
 16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Starting monitor ping, options {'addr': '192.168.12.254'} 
 MainThread::INFO::2014-10-31 
 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Success, id 140634215107920 
 MainThread::INFO::2014-10-31 
 16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 
 'bridge_name': 'ovirtmgmt', 'address': '0'} 
 MainThread::INFO::2014-10-31 
 16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Success, id 140634215108432 
 MainThread::INFO::2014-10-31 
 16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} 
 MainThread::INFO::2014-10-31 
 16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Success, id 39956688 
 MainThread::INFO::2014-10-31 
 16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 
 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f 
 9', 'address': '0'} 
 MainThread::INFO::2014-10-31 
 16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Success, id 140634215107664 
 MainThread::INFO::2014-10-31 
 16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
 '41d4aff1-54e1-4946-a812-2e656bb7d3f9', ' 
 address': '0'} 
 MainThread::INFO::2014-10-31 
 16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
  
 nitor) Success, id 140634006879632 
 MainThread::INFO::2014-10-31 
 16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
  
 :(_initialize_broker) Broker initialized, all submonitors started 
 MainThread::INFO::2014-10-31 
 16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
  
 :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 
 is acquired (file: /rhev/data-center/mnt/g 
 luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
  
 MainThread::INFO::2014-10-31 
 16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
  
 :(refresh) Global metadata: {'maintenance': False} 
 MainThread::INFO::2014-10-31 
 16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
  
 :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra': 
 'metadata_parse_version=1\nmetadata_feature_version 
 =1\ntimestamp=1413882675 (Tue Oct 21 17:11:15 
 2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 
 'hostname': '192.168.12.12', 'host-id': 2, 'engine-status': {'reason': 'vm 
 not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 
 'unknown'}, 'score': 2400, 'maintenance': False, 'host-ts': 1413882675} 
 MainThread::INFO::2014-10-31 
 16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
  Local (id 1): {'engine-health': None, 'bridge': True, 'mem-free': None, 
 'maintenance': False, 'cpu-load': None, 'gateway': True} 
 MainThread::INFO::2014-10-31 
 16:55:40,323::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
  Trying: notify time

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-11-01 Thread Jaicel
, in _checked_communicate
.format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: type 
'exceptions.OSError'
[root@ovirt2 ~]# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked


Thanks,
Jaicel

- Original Message -
From: Jiri Moskovcak jmosk...@redhat.com
To: Jaicel jai...@asti.dost.gov.ph
Cc: Niels de Vos nde...@redhat.com, Vijay Bellur vbel...@redhat.com, 
us...@ovirt.org, Gluster Devel gluster-devel@gluster.org
Sent: Friday, October 31, 2014 11:05:32 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 10:26 AM, Jaicel wrote:
 i've increased the limit and then restarted agent and broker. status 
 normalize, but then right now it went to False state again but still both 
 having 2400 score. agent logs remains the same, with ovirt-ha-agent dead but 
 subsys locked status. ha-broker logs below

 Thread-138::INFO::2014-10-31 
 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
  Connection established
 Thread-138::INFO::2014-10-31 
 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
  Connection closed
 Thread-139::INFO::2014-10-31 
 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
  Connection established
 Thread-139::INFO::2014-10-31 
 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
  Connection closed
 Thread-140::INFO::2014-10-31 
 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
  Connection established
 Thread-140::INFO::2014-10-31 
 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
  Connection closed
 Thread-141::INFO::2014-10-31 
 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
  Connection established
 Thread-141::INFO::2014-10-31 
 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
  Connection closed
 Thread-142::INFO::2014-10-31 
 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
  Connection established
 Thread-142::INFO::2014-10-31 
 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
  Connection closed

 Thanks,
 Jaicel

ok, now it seems that broker runs fine, so I need the recent agent.log 
to debug it more.

--Jirka


 - Original Message -
 From: Jiri Moskovcak jmosk...@redhat.com
 To: Jaicel R. Sabonsolin jai...@asti.dost.gov.ph, Niels de Vos 
 nde...@redhat.com
 Cc: Vijay Bellur vbel...@redhat.com, us...@ovirt.org, Gluster Devel 
 gluster-devel@gluster.org
 Sent: Friday, October 31, 2014 4:32:02 PM
 Subject: Re: [ovirt-users] Hosted-Engine HA problem

 On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
 Hi guys,

 these logs appear on both hosts just like the result of --vm-status. tried 
 to tcpdump on ovirt hosts and gluster nodes but only packets exchange with 
 my monitoring VM(zabbix) appeared.

 agent.log
   new_data = self.refresh(self._state.data)
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py,
  line 77, in refresh
   stats.update(self.hosted_engine.collect_stats())
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py,
  line 662, in collect_stats
   constants.SERVICE_TYPE)
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, 
 line 171, in get_stats_from_storage
   result = self._checked_communicate(request)
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, 
 line 199, in _checked_communicate
   .format(message or response))
 RequestError: Request failed: type 'exceptions.OSError'

 broker.log
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
  line 165, in handle
   response = success  + self._dispatch(data)
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
  line 261, in _dispatch
   .get_all_stats_for_service_type(**options)
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
  line 41, in get_all_stats_for_service_type
   d = self.get_raw_stats_for_service_type(storage_dir, service_type)
 File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
  line 74, in get_raw_stats_for_service_type
   f = os.open(path, direct_flag | os.O_RDONLY)
 OSError: [Errno 24] Too many open files: 
 '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'

 - ah, there we go ^^ you might need to tweak the limit of allowed
 open files as described here [1] or find the app keeps so many files open


 --Jirka

 [1]
 http://www.cyberciti.biz/faq/linux-increase-the-maximum-number

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-10-31 Thread Jaicel R. Sabonsolin
Hi guys,

these logs appear on both hosts just like the result of --vm-status. tried to 
tcpdump on ovirt hosts and gluster nodes but only packets exchange with my 
monitoring VM(zabbix) appeared.

agent.log
new_data = self.refresh(self._state.data)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py,
 line 77, in refresh
stats.update(self.hosted_engine.collect_stats())
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py,
 line 662, in collect_stats
constants.SERVICE_TYPE)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, 
line 171, in get_stats_from_storage
result = self._checked_communicate(request)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, 
line 199, in _checked_communicate
.format(message or response))
RequestError: Request failed: type 'exceptions.OSError'

broker.log
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, 
line 165, in handle
response = success  + self._dispatch(data)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, 
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
Thread-38160::INFO::2014-10-31 
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-38161::INFO::2014-10-31 
10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-38161::ERROR::2014-10-31 
10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Error handling request, data: 'get-stats 
storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent
 service_type=hosted-engine'
Traceback (most recent call last):
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, 
line 165, in handle
response = success  + self._dispatch(data)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, 
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
Thread-38161::INFO::2014-10-31 
10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel

- Original Message -
From: Niels de Vos nde...@redhat.com
To: Vijay Bellur vbel...@redhat.com
Cc: Jiri Moskovcak jmosk...@redhat.com, Jaicel R. Sabonsolin 
jai...@asti.dost.gov.ph, us...@ovirt.org, Gluster Devel 
gluster-devel@gluster.org
Sent: Friday, October 31, 2014 4:11:25 AM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
 On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
 On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
 Hi Guys,
 
 I need help with my ovirt Hosted-Engine HA setup. I am running on 2
 ovirt hosts and 2 gluster nodes with replicated volumes. i already have
 VMs running on my hosts and they can migrate normally once i for example
 power off the host that they are running on. the problem is that the
 engine can't migrate once i switch off the host that hosts the engine.
 
 oVirt3.4.3-1.el6
 KVM 0.12.1.2 - 2.415.el6_5.10
 LIBVIRT   libvirt-0.10.2-29.el6_5.9
 VDSM  vdsm-4.14.17-0.el6
 
 
 right now, i have this result from hosted-engine --vm-status.
 
File /usr/lib64/python2.6/runpy.py, line 122, in
 _run_module_as_main
  __main__, fname, loader, pkg_name)
File /usr/lib64/python2.6/runpy.py, line 34, in _run_code
  exec code in run_globals
File
 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py,
 
 line 111, in module
  if not status_checker.print_status():
File
 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup