[Linux-HA] Antw: Re: resource unmanaged/failed

Ulrich Windl Mon, 12 Dec 2011 03:03:23 -0800

Hi!

We also had the strange effect that ocf:heartbeat:IPaddr2 monitors time out 
when there was significant disk write activity (we used 30s timeouts). I always 
expected these monitors to be in the buffer cache, so essentially they should 
be executed without delay and quite efficiently. I never managed to examine 
that. Probably "barrier=1" for some filesystems was the problem. At least 
"barrier=0" was about the best solutions we could get for our problem.


Regards,
Ulrich

>>> "Aleksey V. Kashin" <aleksey.kas...@gmail.com> schrieb am 12.12.2011 um 
>>> 11:42
in Nachricht
<capjyfp_ds78khmn+u32bpu72tbdf-rpntarvr9crvzr51je...@mail.gmail.com>:
> 2011/12/12, Andrew Beekhof <and...@beekhof.net>:
> > On Fri, Dec 9, 2011 at 7:46 PM, Aleksey V. Kashin
> > <aleksey.kas...@gmail.com> wrote:
> >>> How much do they have now?
> >>
> >> They have 12G RAM.
> >
> > That seems respectable.
> >
> >>
> >>> How much is in use by the radius servers?
> >>
> >>                   total       used       free     shared    buffers
> >> cached
> >> Mem:         12038      11606        431          0          2       6479
> >> -/+ buffers/cache:       5124       6913
> >> Swap:         7632       3398       4233
> >
> > That doesn't really answer the question though, you really need to
> > find out where the memory is going.
> > Although 12Gb is a decent amount of RAM, /If/ a single radius server
> > needs 8Gb, then the machine is clearly not going to be able to handle
> > 2 of them.
> > There's not really anything Pacemaker can do about it.
> >
> 
> On this server also running Oracle RDBMS (database for radius-server).
> It's generate big part of load.
> 
> > About the only thing you can do is increase the operation timeouts and
> > perhaps play with the realtime and nice values of various processes.
> >
> 
> I tried increase "timeout" (How long to wait before declaring the action has
> failed.), but this doesn't work for me. Now I'm testing with
> "failure-timeout" (How many seconds to wait before acting as if the
> failure had not occurred),
> Also I'll try play with process priority for corosync. Thanks for your 
> advices.
> 
> >> And now I'm seeing  again "resource unmanaged/failed" :(
> >
> >
> >
> >>  Resource Group: raddb
> >>     raddb_ip   (ocf::heartbeat:IPaddr2):       Started radius1 (unmanaged)
> >> FAILED
> >>
> >> Failed actions:
> >>    raddb_ip_monitor_15000 (node=radius1, call=4, rc=-2, status=Timed
> >> Out): unknown exec error
> >>    raddb_ip_stop_0 (node=radius1, call=5, rc=-2, status=Timed Out):
> >> unknown exec error
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA@lists.linux-ha.org 
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> >> See also: http://linux-ha.org/ReportingProblems 
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org 
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> > See also: http://linux-ha.org/ReportingProblems 
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
> 

 
 

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Antw: Re: resource unmanaged/failed

Reply via email to