Il 31/10/2014 10:26, Jaicel ha scritto:
> i've increased the limit and then restarted agent and broker. status 
> normalize, but then right now it went to "False" state again but still both 
> having 2400 score. agent logs remains the same, with "ovirt-ha-agent dead but 
> subsys locked" status. ha-broker logs below
> 
> Thread-138::INFO::2014-10-31 
> 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-138::INFO::2014-10-31 
> 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-139::INFO::2014-10-31 
> 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-139::INFO::2014-10-31 
> 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-140::INFO::2014-10-31 
> 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-140::INFO::2014-10-31 
> 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-141::INFO::2014-10-31 
> 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-141::INFO::2014-10-31 
> 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> Thread-142::INFO::2014-10-31 
> 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>  Connection established
> Thread-142::INFO::2014-10-31 
> 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>  Connection closed
> 
> Thanks,
> Jaicel
> 
> ----- Original Message -----
> From: "Jiri Moskovcak" <jmosk...@redhat.com>
> To: "Jaicel R. Sabonsolin" <jai...@asti.dost.gov.ph>, "Niels de Vos" 
> <nde...@redhat.com>
> Cc: "Vijay Bellur" <vbel...@redhat.com>, us...@ovirt.org, "Gluster Devel" 
> <gluster-devel@gluster.org>
> Sent: Friday, October 31, 2014 4:32:02 PM
> Subject: Re: [ovirt-users] Hosted-Engine HA problem
> 
> On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
>> Hi guys,
>>
>> these logs appear on both hosts just like the result of --vm-status. tried 
>> to tcpdump on ovirt hosts and gluster nodes but only packets exchange with 
>> my monitoring VM(zabbix) appeared.
>>
>> agent.log
>>      new_data = self.refresh(self._state.data)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>>  line 77, in refresh
>>      stats.update(self.hosted_engine.collect_stats())
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>  line 662, in collect_stats
>>      constants.SERVICE_TYPE)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
>> line 171, in get_stats_from_storage
>>      result = self._checked_communicate(request)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
>> line 199, in _checked_communicate
>>      .format(message or response))
>> RequestError: Request failed: <type 'exceptions.OSError'>
>>
>> broker.log
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>  line 165, in handle
>>      response = "success " + self._dispatch(data)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>  line 261, in _dispatch
>>      .get_all_stats_for_service_type(**options)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>  line 41, in get_all_stats_for_service_type
>>      d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>  line 74, in get_raw_stats_for_service_type
>>      f = os.open(path, direct_flag | os.O_RDONLY)
>> OSError: [Errno 24] Too many open files: 
>> '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
> 
> - ah, there we go ^^^^^^ you might need to tweak the limit of allowed 
> open files as described here [1] or find the app keeps so many files open


It would be nice to understand if this is related to that host only or if this 
is a common case and we should increase the limit within setup.
Never seen this issue before.


> 
> 
> --Jirka
> 
> [1] 
> http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
> 
>> Thread-38160::INFO::2014-10-31 
>> 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Connection closed
>> Thread-38161::INFO::2014-10-31 
>> 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>  Connection established
>> Thread-38161::ERROR::2014-10-31 
>> 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Error handling request, data: 'get-stats 
>> storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent
>>  service_type=hosted-engine'
>> Traceback (most recent call last):
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>  line 165, in handle
>>      response = "success " + self._dispatch(data)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>  line 261, in _dispatch
>>      .get_all_stats_for_service_type(**options)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>  line 41, in get_all_stats_for_service_type
>>      d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>>    File 
>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>  line 74, in get_raw_stats_for_service_type
>>      f = os.open(path, direct_flag | os.O_RDONLY)
>> OSError: [Errno 24] Too many open files: 
>> '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
>> Thread-38161::INFO::2014-10-31 
>> 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Connection closed
>>
>> Thanks,
>> Jaicel
>>
>> ----- Original Message -----
>> From: "Niels de Vos" <nde...@redhat.com>
>> To: "Vijay Bellur" <vbel...@redhat.com>
>> Cc: "Jiri Moskovcak" <jmosk...@redhat.com>, "Jaicel R. Sabonsolin" 
>> <jai...@asti.dost.gov.ph>, us...@ovirt.org, "Gluster Devel" 
>> <gluster-devel@gluster.org>
>> Sent: Friday, October 31, 2014 4:11:25 AM
>> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>>
>> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
>>> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
>>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
>>>>> Hi Guys,
>>>>>
>>>>> I need help with my ovirt Hosted-Engine HA setup. I am running on 2
>>>>> ovirt hosts and 2 gluster nodes with replicated volumes. i already have
>>>>> VMs running on my hosts and they can migrate normally once i for example
>>>>> power off the host that they are running on. the problem is that the
>>>>> engine can't migrate once i switch off the host that hosts the engine.
>>>>>
>>>>>     oVirt        3.4.3-1.el6
>>>>>     KVM         0.12.1.2 - 2.415.el6_5.10
>>>>>     LIBVIRT   libvirt-0.10.2-29.el6_5.9
>>>>>     VDSM      vdsm-4.14.17-0.el6
>>>>>
>>>>>
>>>>> right now, i have this result from hosted-engine --vm-status.
>>>>>
>>>>>        File "/usr/lib64/python2.6/runpy.py", line 122, in
>>>>>     _run_module_as_main
>>>>>          "__main__", fname, loader, pkg_name)
>>>>>        File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
>>>>>          exec code in run_globals
>>>>>        File
>>>>>
>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>
>>>>>     line 111, in <module>
>>>>>          if not status_checker.print_status():
>>>>>        File
>>>>>
>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>
>>>>>     line 58, in print_status
>>>>>          all_host_stats = ha_cli.get_all_host_stats()
>>>>>        File
>>>>>
>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>
>>>>>     line 137, in get_all_host_stats
>>>>>          return self.get_all_stats(self.StatModes.HOST)
>>>>>        File
>>>>>
>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>
>>>>>     line 86, in get_all_stats
>>>>>          constants.SERVICE_TYPE)
>>>>>        File
>>>>>
>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>
>>>>>     line 171, in get_stats_from_storage
>>>>>          result = self._checked_communicate(request)
>>>>>        File
>>>>>
>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>
>>>>>     line 199, in _checked_communicate
>>>>>          .format(message or response))
>>>>>     ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
>>>>>     <type 'exceptions.OSError'>
>>>>>
>>>>>
>>>>> restarting ha-broker and ha-agent normalizes the status but eventually
>>>>> it would become "false" and then return to the result above. hope you
>>>>> guys could help me with this.
>>>>>
>>>>
>>>> Hi Jaicel,
>>>> please attach agent.log and broker.log from the host where you trying to
>>>> run hosted-engine --vm-status. I have a feeling that you ran into a
>>>> known problem on gluster - stalled file descriptor, in that case the
>>>> only known solution at this time is to restart the broker & agent as you
>>>> have already found out.
>>>>
>>>
>>> Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective.
>>
>> I'd welcome any details on this "stalled file descriptor" problem. Is
>> there a bug filed with some details like logs, sysrq-t and maybe even
>> tcpdumps? If there is an easy way to reproduce this behaviour, I can
>> surely look into it and hopefully come up with some advise or fix.
>>
>> Thanks,
>> Niels
>>
> _______________________________________________
> Users mailing list
> us...@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 


-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Reply via email to