On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote: > On Tue, Oct 18, 2011 at 12:19 PM, <renayama19661...@ybb.ne.jp> wrote: > > Hi, > > > > We sometimes fail in a stop of attrd. > > > > Step1. start a cluster in 2 nodes > > Step2. stop the first node.(/etc/init.d/heartbeat stop.) > > Step3. stop the second node after time passed a > > little.(/etc/init.d/heartbeat > > stop.) > > > > The attrd catches the TERM signal, but does not stop. > > There's no evidence that it actually catches it, only that it is sent. > I've seen it before but never figured out why it occurs.
I had it once tracked down almost to where it occurs, but then got distracted. Yes the signal was delivered. I *think* it had to do with attrd doing a blocking read, or looping in some internal message delivery function too often. I had a quick look at the code again now, to try and remember, but I'm not sure. I *may* be that, because xmlfromIPC(IPC_Channel * ch, int timeout) calls msg = msgfromIPC_timeout(ch, MSG_ALLOWINTR, timeout, &ipc_rc); And MSG_ALLOWINTR will cause msgfromIPC_ll() to IPC_INTR: if ( allow_intr){ goto startwait; Depending on the frequency of deliverd signals, it may cause this goto startwait loop to never exit, because the timeout always starts again from the full passed in timeout. If only one signal is deliverd, it may still take 120 seconds (MAX_IPC_DELAY from crm.h) to be actually processed, as the signal handler only raises a flag for the next mainloop iteration. If a (non-fatal) signal is delivered every few seconds, then the goto loop will never timeout. Please someone check this for plausibility ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker