Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem
Hi Jirka, the patch works. it stabilized the status of my two hosts. the engine migration during failover also works fine. thanks guys! Jaicel From: Jiri Moskovcak jmosk...@redhat.com To: Jaicel jai...@asti.dost.gov.ph Cc: Niels de Vos nde...@redhat.com, Vijay Bellur vbel...@redhat.com, us...@ovirt.org, Gluster Devel gluster-devel@gluster.org Sent: Monday, November 3, 2014 3:33:16 PM Subject: Re: [ovirt-users] Hosted-Engine HA problem On 11/01/2014 07:43 AM, Jaicel wrote: Hi, my engine runs on Host1. current status and agent logs below. Host 1 Hi, it seems like you ran into [1], you can either zero-out the metadata file or apply the patch from [1] manually. --Jirka [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925 MainThread::INFO::2014-10-31 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi ne-ha agent 1.1.6 started MainThread::INFO::2014-10-31 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: :(_get_hostname) Found certificate common name: 192.168.12.11 MainThread::INFO::2014-10-31 16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: :(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2014-10-31 16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Starting monitor ping, options {'addr': '192.168.12.254'} MainThread::INFO::2014-10-31 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Success, id 140634215107920 MainThread::INFO::2014-10-31 16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2014-10-31 16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Success, id 140634215108432 MainThread::INFO::2014-10-31 16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2014-10-31 16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Success, id 39956688 MainThread::INFO::2014-10-31 16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f 9', 'address': '0'} MainThread::INFO::2014-10-31 16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Success, id 140634215107664 MainThread::INFO::2014-10-31 16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', ' address': '0'} MainThread::INFO::2014-10-31 16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo nitor) Success, id 140634006879632 MainThread::INFO::2014-10-31 16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: :(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2014-10-31 16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /rhev/data-center/mnt/g luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace) MainThread::INFO::2014-10-31 16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: :(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2014-10-31 16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra': 'metadata_parse_version=1\nmetadata_feature_version =1\ntimestamp=1413882675 (Tue Oct 21 17:11:15 2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 'hostname': '192.168.12.12', 'host-id': 2, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400, 'maintenance': False, 'host-ts': 1413882675} MainThread::INFO::2014-10-31 16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': None, 'bridge': True, 'mem-free': None, 'maintenance': False, 'cpu-load': None, 'gateway': True} MainThread::INFO::2014-10-31 16:55:40,323::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time
Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem
, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: type 'exceptions.OSError' [root@ovirt2 ~]# service ovirt-ha-agent status ovirt-ha-agent dead but subsys locked Thanks, Jaicel - Original Message - From: Jiri Moskovcak jmosk...@redhat.com To: Jaicel jai...@asti.dost.gov.ph Cc: Niels de Vos nde...@redhat.com, Vijay Bellur vbel...@redhat.com, us...@ovirt.org, Gluster Devel gluster-devel@gluster.org Sent: Friday, October 31, 2014 11:05:32 PM Subject: Re: [ovirt-users] Hosted-Engine HA problem On 10/31/2014 10:26 AM, Jaicel wrote: i've increased the limit and then restarted agent and broker. status normalize, but then right now it went to False state again but still both having 2400 score. agent logs remains the same, with ovirt-ha-agent dead but subsys locked status. ha-broker logs below Thread-138::INFO::2014-10-31 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-138::INFO::2014-10-31 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-139::INFO::2014-10-31 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-139::INFO::2014-10-31 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-140::INFO::2014-10-31 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-140::INFO::2014-10-31 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-141::INFO::2014-10-31 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-141::INFO::2014-10-31 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-142::INFO::2014-10-31 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-142::INFO::2014-10-31 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thanks, Jaicel ok, now it seems that broker runs fine, so I need the recent agent.log to debug it more. --Jirka - Original Message - From: Jiri Moskovcak jmosk...@redhat.com To: Jaicel R. Sabonsolin jai...@asti.dost.gov.ph, Niels de Vos nde...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, us...@ovirt.org, Gluster Devel gluster-devel@gluster.org Sent: Friday, October 31, 2014 4:32:02 PM Subject: Re: [ovirt-users] Hosted-Engine HA problem On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote: Hi guys, these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared. agent.log new_data = self.refresh(self._state.data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 662, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 199, in _checked_communicate .format(message or response)) RequestError: Request failed: type 'exceptions.OSError' broker.log File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 165, in handle response = success + self._dispatch(data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 261, in _dispatch .get_all_stats_for_service_type(**options) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' - ah, there we go ^^ you might need to tweak the limit of allowed open files as described here [1] or find the app keeps so many files open --Jirka [1] http://www.cyberciti.biz/faq/linux-increase-the-maximum-number
Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem
Hi guys, these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared. agent.log new_data = self.refresh(self._state.data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 662, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 199, in _checked_communicate .format(message or response)) RequestError: Request failed: type 'exceptions.OSError' broker.log File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 165, in handle response = success + self._dispatch(data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 261, in _dispatch .get_all_stats_for_service_type(**options) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' Thread-38160::INFO::2014-10-31 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-38161::INFO::2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'get-stats storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent service_type=hosted-engine' Traceback (most recent call last): File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 165, in handle response = success + self._dispatch(data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 261, in _dispatch .get_all_stats_for_service_type(**options) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' Thread-38161::INFO::2014-10-31 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thanks, Jaicel - Original Message - From: Niels de Vos nde...@redhat.com To: Vijay Bellur vbel...@redhat.com Cc: Jiri Moskovcak jmosk...@redhat.com, Jaicel R. Sabonsolin jai...@asti.dost.gov.ph, us...@ovirt.org, Gluster Devel gluster-devel@gluster.org Sent: Friday, October 31, 2014 4:11:25 AM Subject: Re: [ovirt-users] Hosted-Engine HA problem On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote: On 10/30/2014 06:45 PM, Jiri Moskovcak wrote: On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote: Hi Guys, I need help with my ovirt Hosted-Engine HA setup. I am running on 2 ovirt hosts and 2 gluster nodes with replicated volumes. i already have VMs running on my hosts and they can migrate normally once i for example power off the host that they are running on. the problem is that the engine can't migrate once i switch off the host that hosts the engine. oVirt3.4.3-1.el6 KVM 0.12.1.2 - 2.415.el6_5.10 LIBVIRT libvirt-0.10.2-29.el6_5.9 VDSM vdsm-4.14.17-0.el6 right now, i have this result from hosted-engine --vm-status. File /usr/lib64/python2.6/runpy.py, line 122, in _run_module_as_main __main__, fname, loader, pkg_name) File /usr/lib64/python2.6/runpy.py, line 34, in _run_code exec code in run_globals File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py, line 111, in module if not status_checker.print_status(): File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup