Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-11-10 Thread Jaicel
Hi Jirka, 

the patch works. it stabilized the status of my two hosts. the engine migration 
during failover also works fine. thanks guys! 

Jaicel 


From: "Jiri Moskovcak"  
To: "Jaicel"  
Cc: "Niels de Vos" , "Vijay Bellur" , 
us...@ovirt.org, "Gluster Devel"  
Sent: Monday, November 3, 2014 3:33:16 PM 
Subject: Re: [ovirt-users] Hosted-Engine HA problem 

On 11/01/2014 07:43 AM, Jaicel wrote: 
> Hi, 
> 
> my engine runs on Host1. current status and agent logs below. 
> 
> Host 1 

Hi, 
it seems like you ran into [1], you can either zero-out the metadata 
file or apply the patch from [1] manually. 

--Jirka 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925 

> 
> MainThread::INFO::2014-10-31 
> 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> ovirt-hosted-engi 
> ne-ha agent 1.1.6 started 
> MainThread::INFO::2014-10-31 
> 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_get_hostname) Found certificate common name: 192.168.12.11 
> MainThread::INFO::2014-10-31 
> 16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_broker) Initializing ha-broker connection 
> MainThread::INFO::2014-10-31 
> 16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor ping, options {'addr': '192.168.12.254'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634215107920 
> MainThread::INFO::2014-10-31 
> 16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 
> 'bridge_name': 'ovirtmgmt', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634215108432 
> MainThread::INFO::2014-10-31 
> 16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 39956688 
> MainThread::INFO::2014-10-31 
> 16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 
> 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f 
> 9', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634215107664 
> MainThread::INFO::2014-10-31 
> 16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
> '41d4aff1-54e1-4946-a812-2e656bb7d3f9', ' 
> address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634006879632 
> MainThread::INFO::2014-10-31 
> 16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_broker) Broker initialized, all submonitors started 
> MainThread::INFO::2014-10-31 
> 16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 
> is acquired (file: /rhev/data-center/mnt/g 
> luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
>  
> MainThread::INFO::2014-10-31 
> 16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(refresh) Global metadata: {'maintenance': False} 
> MainThread::INFO::2014-10-31 
> 16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra': 
> 'metadata_parse_version=1\nmetadata_feature_version 
> =1\ntimestamp=1413882675 (Tue Oct 21 17:11:15 
> 2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 
> 'hostname': '192.168.12.12', &#

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-10-31 Thread Jaicel
 
line 199, in _checked_communicate
.format(message or response))
RequestError: Request failed: 

[root@ovirt2 ~]# hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
exec code in run_globals
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 
111, in 
if not status_checker.print_status():
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 
58, in print_status
all_host_stats = ha_cli.get_all_host_stats()
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 137, in get_all_host_stats
return self.get_all_stats(self.StatModes.HOST)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 86, in get_all_stats
constants.SERVICE_TYPE)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
result = self._checked_communicate(request)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
.format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: 
[root@ovirt2 ~]# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked


Thanks,
Jaicel

- Original Message -
From: "Jiri Moskovcak" 
To: "Jaicel" 
Cc: "Niels de Vos" , "Vijay Bellur" , 
us...@ovirt.org, "Gluster Devel" 
Sent: Friday, October 31, 2014 11:05:32 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 10:26 AM, Jaicel wrote:
> i've increased the limit and then restarted agent and broker. status 
> normalize, but then right now it went to "False" state again but still both 
> having 2400 score. agent logs remains the same, with "ovirt-ha-agent dead but 
> subsys locked" status. ha-broker logs below
>
> Thread-138::INFO::2014-10-31 
> 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-138::INFO::2014-10-31 
> 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-139::INFO::2014-10-31 
> 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-139::INFO::2014-10-31 
> 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-140::INFO::2014-10-31 
> 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-140::INFO::2014-10-31 
> 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-141::INFO::2014-10-31 
> 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-141::INFO::2014-10-31 
> 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-142::INFO::2014-10-31 
> 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-142::INFO::2014-10-31 
> 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
>
> Thanks,
> Jaicel

ok, now it seems that broker runs fine, so I need the recent agent.log 
to debug it more.

--Jirka

>
> - Original Message -
> From: "Jiri Moskovcak" 
> To: "Jaicel R. Sabonsolin" , "Niels de Vos" 
> 
> Cc: "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" 
> 
> Sent: Friday, October 31, 2014 4:32:02 PM
> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>
> On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
>> Hi guys,
>>
>> these logs appear on both hosts just like the result of --vm-status. tried 
>> to tcpdump on ovirt hosts and gluster nodes but only packets exchange with 
>> my monitoring VM(zabbix) appeared.
>>
>> agent.log
>>   new_data = self.refresh(self._state.data)
>> File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>>  line 77, in refresh
>>   stats.update(self.hosted_engine.collect_stats())
>> File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-10-31 Thread Jaicel R. Sabonsolin
Hi guys,

these logs appear on both hosts just like the result of --vm-status. tried to 
tcpdump on ovirt hosts and gluster nodes but only packets exchange with my 
monitoring VM(zabbix) appeared.

agent.log
new_data = self.refresh(self._state.data)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 77, in refresh
stats.update(self.hosted_engine.collect_stats())
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 662, in collect_stats
constants.SERVICE_TYPE)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
result = self._checked_communicate(request)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
.format(message or response))
RequestError: Request failed: 

broker.log
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 165, in handle
response = "success " + self._dispatch(data)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
Thread-38160::INFO::2014-10-31 
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-38161::INFO::2014-10-31 
10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-38161::ERROR::2014-10-31 
10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Error handling request, data: 'get-stats 
storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent
 service_type=hosted-engine'
Traceback (most recent call last):
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 165, in handle
response = "success " + self._dispatch(data)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
Thread-38161::INFO::2014-10-31 
10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel

- Original Message -
From: "Niels de Vos" 
To: "Vijay Bellur" 
Cc: "Jiri Moskovcak" , "Jaicel R. Sabonsolin" 
, us...@ovirt.org, "Gluster Devel" 

Sent: Friday, October 31, 2014 4:11:25 AM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
> >On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
> >>Hi Guys,
> >>
> >>I need help with my ovirt Hosted-Engine HA setup. I am running on 2
> >>ovirt hosts and 2 gluster nodes with replicated volumes. i already have
> >>VMs running on my hosts and they can migrate normally once i for example
> >>power off the host that they are running on. the problem is that the
> >>engine can't migrate once i switch off the host that hosts the engine.
> >>
> >>oVirt3.4.3-1.el6
> >>KVM 0.12.1.2 - 2.415.el6_5.10
> >>LIBVIRT   libvirt-0.10.2-29.el6_5.9
> >>VDSM  vdsm-4.14.17-0.el6
> >>
> >>
> >>right now, i have this result from hosted-engine --vm-status.
> >>
> >>   File "/usr/lib64/python2.6/runpy.py", line 122, in
> >>_run_modu

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-10-31 Thread Jaicel
i've increased the limit and then restarted agent and broker. status normalize, 
but then right now it went to "False" state again but still both having 2400 
score. agent logs remains the same, with "ovirt-ha-agent dead but subsys 
locked" status. ha-broker logs below

Thread-138::INFO::2014-10-31 
17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-138::INFO::2014-10-31 
17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-139::INFO::2014-10-31 
17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-139::INFO::2014-10-31 
17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-140::INFO::2014-10-31 
17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-140::INFO::2014-10-31 
17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-141::INFO::2014-10-31 
17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-141::INFO::2014-10-31 
17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-142::INFO::2014-10-31 
17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-142::INFO::2014-10-31 
17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel

- Original Message -
From: "Jiri Moskovcak" 
To: "Jaicel R. Sabonsolin" , "Niels de Vos" 

Cc: "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" 

Sent: Friday, October 31, 2014 4:32:02 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
> Hi guys,
>
> these logs appear on both hosts just like the result of --vm-status. tried to 
> tcpdump on ovirt hosts and gluster nodes but only packets exchange with my 
> monitoring VM(zabbix) appeared.
>
> agent.log
>  new_data = self.refresh(self._state.data)
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>  line 77, in refresh
>  stats.update(self.hosted_engine.collect_stats())
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 662, in collect_stats
>  constants.SERVICE_TYPE)
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 171, in get_stats_from_storage
>  result = self._checked_communicate(request)
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 199, in _checked_communicate
>  .format(message or response))
> RequestError: Request failed: 
>
> broker.log
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
> line 165, in handle
>  response = "success " + self._dispatch(data)
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
> line 261, in _dispatch
>  .get_all_stats_for_service_type(**options)
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>  line 41, in get_all_stats_for_service_type
>  d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>  line 74, in get_raw_stats_for_service_type
>  f = os.open(path, direct_flag | os.O_RDONLY)
> OSError: [Errno 24] Too many open files: 
> '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'

- ah, there we go ^^ you might need to tweak the limit of allowed 
open files as described here [1] or find the app keeps so many files open


--Jirka

[1] 
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

> Thread-38160::INFO::2014-10-31 
> 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-38161::INFO::2014-10-31 
> 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-38161::ERROR::2014-10-31 
> 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Error handling reques