Hi! I thought that all the bugs have already been caught. :) But today(already tonight) build last git PCMK with add upstart. And again catch hangs pending. logs http://send2me.ru/pcmk-04-Mar-2014.tar.bz2
24.02.2014, 03:44, "Andrew Beekhof" <and...@beekhof.net>: > On 22 Feb 2014, at 7:07 pm, Andrey Groshev <gre...@yandex.ru> wrote: > >> 21.02.2014, 04:00, "Andrew Beekhof" <and...@beekhof.net>: >>> On 20 Feb 2014, at 10:04 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>> 20.02.2014, 13:57, "Andrew Beekhof" <and...@beekhof.net>: >>>>> On 20 Feb 2014, at 5:33 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>> 20.02.2014, 01:22, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>> On 20 Feb 2014, at 4:18 am, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>>>> 19.02.2014, 06:47, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>> On 18 Feb 2014, at 9:29 pm, Andrey Groshev <gre...@yandex.ru> >>>>>>>>> wrote: >>>>>>>>>> Hi, ALL and Andrew! >>>>>>>>>> >>>>>>>>>> Today is a good day - I killed a lot, and a lot of shooting at >>>>>>>>>> me. >>>>>>>>>> In general - I am happy (almost like an elephant) :) >>>>>>>>>> Except resources on the node are important to me eight >>>>>>>>>> processes: corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd. >>>>>>>>>> I killed them with different signals (4,6,11 and even 9). >>>>>>>>>> Behavior does not depend of number signal - it's good. >>>>>>>>>> If STONITH send reboot to the node - it rebooted and rejoined >>>>>>>>>> the cluster - too it's good. >>>>>>>>>> But the behavior is different from killing various demons. >>>>>>>>>> >>>>>>>>>> Turned four groups: >>>>>>>>>> 1. corosync,cib - STONITH work 100%. >>>>>>>>>> Kill via any signals - call STONITH and reboot. >>>>>>>>> excellent >>>>>>>>>> 3. stonithd,attrd,pengine - not need STONITH >>>>>>>>>> This daemons simple restart, resources - stay running. >>>>>>>>> right >>>>>>>>>> 2. lrmd,crmd - strange behavior STONITH. >>>>>>>>>> Sometimes called STONITH - and the corresponding reaction. >>>>>>>>>> Sometimes restart daemon >>>>>>>>> The daemon will always try to restart, the only variable is how >>>>>>>>> long it takes the peer to notice and initiate fencing. >>>>>>>>> If the failure happens just before a they're due to receive totem >>>>>>>>> token, the failure will be very quickly detected and the node fenced. >>>>>>>>> If the failure happens just after, then detection will take >>>>>>>>> longer - giving the node longer to recover and not be fenced. >>>>>>>>> >>>>>>>>> So fence/not fence is normal and to be expected. >>>>>>>>>> and restart resources with large delay MS:pgsql. >>>>>>>>>> One time after restart crmd - pgsql don't restart. >>>>>>>>> I would not expect pgsql to ever restart - if the RA does its job >>>>>>>>> properly anyway. >>>>>>>>> In the case the node is not fenced, the crmd will respawn and the >>>>>>>>> the PE will request that it re-detect the state of all resources. >>>>>>>>> >>>>>>>>> If the agent reports "all good", then there is nothing more to do. >>>>>>>>> If the agent is not reporting "all good", you should really be >>>>>>>>> asking why. >>>>>>>>>> 4. pacemakerd - nothing happens. >>>>>>>>> On non-systemd based machines, correct. >>>>>>>>> >>>>>>>>> On a systemd based machine pacemakerd is respawned and reattaches >>>>>>>>> to the existing daemons. >>>>>>>>> Any subsequent daemon failure will be detected and the daemon >>>>>>>>> respawned. >>>>>>>> And! I almost forgot about IT! >>>>>>>> Exist another (NORMAL) the variants, the methods, the ideas? >>>>>>>> Without this ... @$%#$%&$%^&$%^&##@#$$^$%& !!!!! >>>>>>>> Otherwise - it's a full epic fail ;) >>>>>>> -ENOPARSE >>>>>> OK, I remove my personal attitude to "systemd". >>>>>> Let me explain. >>>>>> >>>>>> Somewhere in the beginning of this topic, I wrote: >>>>>> A.G.:Who knows who runs lrmd? >>>>>> A.B.:Pacemakerd. >>>>>> That's one! >>>>>> >>>>>> Let's see the list of processes: >>>>>> #ps -axf >>>>>> ..... >>>>>> 6067 ? Ssl 7:24 corosync >>>>>> 6092 ? S 0:25 pacemakerd >>>>>> 6094 ? Ss 116:13 \_ /usr/libexec/pacemaker/cib >>>>>> 6095 ? Ss 0:25 \_ /usr/libexec/pacemaker/stonithd >>>>>> 6096 ? Ss 1:27 \_ /usr/libexec/pacemaker/lrmd >>>>>> 6097 ? Ss 0:49 \_ /usr/libexec/pacemaker/attrd >>>>>> 6098 ? Ss 0:25 \_ /usr/libexec/pacemaker/pengine >>>>>> 6099 ? Ss 0:29 \_ /usr/libexec/pacemaker/crmd >>>>>> ..... >>>>>> That's two! >>>>> Whats two? I don't follow. >>>> In the sense that it creates other processes. But it does not matter. >>>>>> And more, more... >>>>>> Now you must understand - why I want this process to work always. >>>>>> Even I think, No need for anyone here to explain it! >>>>>> >>>>>> And Now you say about "pacemakerd nice work, but only on systemd >>>>>> distros" !!! >>>>> No, I;m saying it works _better_ on systemd distros. >>>>> On non-systemd distros you still need quite a few unlikely-to-happen >>>>> failures to trigger a situation in which the node still gets fenced and >>>>> recovered (assuming no-one saw any of the error messages and didn't run >>>>> "service pacemaker restart" prior to the additional failures). >>>> Can you show me the place where: >>>> "On a systemd based machine pacemakerd is respawned and reattaches to >>>> the existing daemons."? >>> The code for it is in mcp/pacemaker.c, look for >>> find_and_track_existing_processes() >>> >>> The ps tree will look different though >>> >>> 6094 ? Ss 116:13 /usr/libexec/pacemaker/cib >>> 6095 ? Ss 0:25 /usr/libexec/pacemaker/stonithd >>> 6096 ? Ss 1:27 /usr/libexec/pacemaker/lrmd >>> 6097 ? Ss 0:49 /usr/libexec/pacemaker/attrd >>> 6098 ? Ss 0:25 /usr/libexec/pacemaker/pengine >>> 6099 ? Ss 0:29 /usr/libexec/pacemaker/crmd >>> ... >>> 6666 ? S 0:25 pacemakerd >>> >>> but pacemakerd will be watching the old children and respawning them on >>> failure. >>> at which point you might see: >>> >>> 6094 ? Ss 116:13 /usr/libexec/pacemaker/cib >>> 6096 ? Ss 1:27 /usr/libexec/pacemaker/lrmd >>> 6097 ? Ss 0:49 /usr/libexec/pacemaker/attrd >>> 6098 ? Ss 0:25 /usr/libexec/pacemaker/pengine >>> 6099 ? Ss 0:29 /usr/libexec/pacemaker/crmd >>> ... >>> 6666 ? S 0:25 pacemakerd >>> 6667 ? Ss 0:25 \_ /usr/libexec/pacemaker/stonithd >>>> If I respawn via upstart process pacemakerd - "reattaches to the >>>> existing daemons" ? >>> If upstart is capable of detecting the pacemakerd failure and >>> automagically respawning it, then yes - the same process will happen. >> Some people defend you, send me hate mail when I'm not restrained. > > You should see the mail I get off-list ;-) > >> But You're also a beetle :) > > I'm not 100% sure what you mean there. > >> Why you did not say anything about supporting upstart in spec? > > Mostly because I don't run it anywhere, so I have no idea what it does by > default or can be configured to do. > Its not malicious, the feature was simply written and tested in the context > of systemd. > > Also, when I think upstart, I think debian based distros which don't use spec > files ;-) > > , > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org