On 04.02.2016 15:43, Bogdan Dobrelya wrote: > Hello. > Regarding the original issue, good news are the resource-agents > ocf-shellfuncs is no more causing fork bombs to the dummy OCF RA [0] > after the fix [1] done. The bad news are that "self-forking" monitors > issue seems remaining for the rabbitmq OCF RA [2], and I can reproduce > it for another custom agent [3], so I'd guess it may be a valid for > another ones as well. > > IIUC, the issue seems related to how lrmd's forking monitor actions. > I tried to debug both pacemaker 1.1.10, 1.1.12 with gdb as the following: > > # cat ./cmds > set follow-fork-mode child > set detach-on-fork off > set follow-exec-mode new > catch fork > catch vfork > cont > # gdb -x cmds /usr/lib/pacemaker/lrmd `pgrep lrmd` > > I can confirm it catches forked monitors and makes nested forks as well. > But I have *many* debug symbols missing, bt is full of question marks > and, honestly, I'm not a gdb guru and do not now that to check in for > reproduced cases. > > So any help with how to troubleshooting things further are very appreciated!
I figured out this is expected behaviour. There are no fork bombs left, but usual fork & exec syscalls each time the OCF RA is calling a shell command or ocf_run, ocf_log functions. And those false "self-forks" are nothing more but a transient state between the fork and exec calls, when the caption of the child process has yet to be updated... So I believe the problem was solved by the aforementioned patch completely. > > [0] https://github.com/bogdando/dummy-ocf-ra > [1] https://github.com/ClusterLabs/resource-agents/issues/734 > [2] > https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf > [3] > https://git.openstack.org/cgit/openstack/fuel-library/tree/files/fuel-ha-utils/ocf/ns_vrouter > > On 04.01.2016 17:33, Bogdan Dobrelya wrote: >> On 04.01.2016 17:14, Dejan Muhamedagic wrote: >>> Hi, >>> >>> On Mon, Jan 04, 2016 at 04:52:43PM +0100, Bogdan Dobrelya wrote: >>>> On 04.01.2016 16:36, Ken Gaillot wrote: >>>>> On 01/04/2016 09:25 AM, Bogdan Dobrelya wrote: >>>>>> On 04.01.2016 15:50, Bogdan Dobrelya wrote: >>> [...] >>>>>> Also note, that lrmd spawns *many* monitors like: >>>>>> root 6495 0.0 0.0 70268 1456 ? Ss 2015 4:56 \_ >>>>>> /usr/lib/pacemaker/lrmd >>>>>> root 31815 0.0 0.0 4440 780 ? S 15:08 0:00 | \_ >>>>>> /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>>> root 31908 0.0 0.0 4440 388 ? S 15:08 0:00 | >>>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>>> root 31910 0.0 0.0 4440 384 ? S 15:08 0:00 | >>>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>>> root 31915 0.0 0.0 4440 392 ? S 15:08 0:00 | >>>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>>> ... >>>>> >>>>> At first glance, that looks like your monitor action is calling itself >>>>> recursively, but I don't see how in your code. >>>> >>>> Yes, it should be a bug in the ocf-shellfuncs's ocf_log(). >>> >>> If you're sure about that, please open an issue at >>> https://github.com/ClusterLabs/resource-agents/issues >> >> Submitted [0]. Thank you! >> Note, that it seems the very import action causes the issue, not the >> ocf_run or ocf_log code itself. >> >> [0] https://github.com/ClusterLabs/resource-agents/issues/734 >> >>> >>> Thanks, >>> >>> Dejan >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> > > -- Best regards, Bogdan Dobrelya, Irc #bogdando _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org