Hi! We also had the strange effect that ocf:heartbeat:IPaddr2 monitors time out when there was significant disk write activity (we used 30s timeouts). I always expected these monitors to be in the buffer cache, so essentially they should be executed without delay and quite efficiently. I never managed to examine that. Probably "barrier=1" for some filesystems was the problem. At least "barrier=0" was about the best solutions we could get for our problem.
Regards, Ulrich >>> "Aleksey V. Kashin" <aleksey.kas...@gmail.com> schrieb am 12.12.2011 um >>> 11:42 in Nachricht <capjyfp_ds78khmn+u32bpu72tbdf-rpntarvr9crvzr51je...@mail.gmail.com>: > 2011/12/12, Andrew Beekhof <and...@beekhof.net>: > > On Fri, Dec 9, 2011 at 7:46 PM, Aleksey V. Kashin > > <aleksey.kas...@gmail.com> wrote: > >>> How much do they have now? > >> > >> They have 12G RAM. > > > > That seems respectable. > > > >> > >>> How much is in use by the radius servers? > >> > >> total used free shared buffers > >> cached > >> Mem: 12038 11606 431 0 2 6479 > >> -/+ buffers/cache: 5124 6913 > >> Swap: 7632 3398 4233 > > > > That doesn't really answer the question though, you really need to > > find out where the memory is going. > > Although 12Gb is a decent amount of RAM, /If/ a single radius server > > needs 8Gb, then the machine is clearly not going to be able to handle > > 2 of them. > > There's not really anything Pacemaker can do about it. > > > > On this server also running Oracle RDBMS (database for radius-server). > It's generate big part of load. > > > About the only thing you can do is increase the operation timeouts and > > perhaps play with the realtime and nice values of various processes. > > > > I tried increase "timeout" (How long to wait before declaring the action has > failed.), but this doesn't work for me. Now I'm testing with > "failure-timeout" (How many seconds to wait before acting as if the > failure had not occurred), > Also I'll try play with process priority for corosync. Thanks for your > advices. > > >> And now I'm seeing again "resource unmanaged/failed" :( > > > > > > > >> Resource Group: raddb > >> raddb_ip (ocf::heartbeat:IPaddr2): Started radius1 (unmanaged) > >> FAILED > >> > >> Failed actions: > >> raddb_ip_monitor_15000 (node=radius1, call=4, rc=-2, status=Timed > >> Out): unknown exec error > >> raddb_ip_stop_0 (node=radius1, call=5, rc=-2, status=Timed Out): > >> unknown exec error > >> _______________________________________________ > >> Linux-HA mailing list > >> Linux-HA@lists.linux-ha.org > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > Linux-HA@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems