On Mon, Jan 16, 2012 at 11:42 PM, Andrew Beekhof <and...@beekhof.net> wrote: > On Mon, Jan 16, 2012 at 11:30 PM, Andrew Beekhof <and...@beekhof.net> wrote: >> On Mon, Jan 16, 2012 at 11:27 PM, Andrew Beekhof <and...@beekhof.net> wrote: >>> I know I could just apply the patch and be done, but I'd like to >>> understand this so it works for the right reason.
Ok, done: https://github.com/beekhof/pacemaker/commit/2a6b296 If I'm adding voodoo, I at least want the reason well documented so it can be removed again if the reason goes away. >>> On Mon, Jan 16, 2012 at 7:30 PM, Lars Ellenberg >>> <lars.ellenb...@linbit.com> wrote: >>>> On Mon, Jan 16, 2012 at 04:46:58PM +1100, Andrew Beekhof wrote: >>>>> > Now we proceed to the next mainloop poll: >>>>> > >>>>> >> poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, >>>>> >> {fd=5, events=POLLIN|POLLPRI}], 3, -1 >>>>> > >>>>> > Note the -1 (infinity timeout!) >>>>> > >>>>> > So even though the trigger was (presumably) set, >>>>> > and the ->prepare() should have returned true, >>>>> > the mainloop waits forever for "something" to happen on those file >>>>> > descriptors. >>>>> > >>>>> > >>>>> > I suggest this: >>>>> > >>>>> > crm_trigger_prepare should set *timeout = 0, if trigger is set. >>>>> > >>>>> > Also think about this race: crm_trigger_prepare was already >>>>> > called, only then the signal came in... >>>>> > >>>>> > diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c >>>>> > index 2e8b1d0..fd17b87 100644 >>>>> > --- a/lib/common/mainloop.c >>>>> > +++ b/lib/common/mainloop.c >>>>> > @@ -33,6 +33,13 @@ static gboolean >>>>> > crm_trigger_prepare(GSource * source, gint * timeout) >>>>> > { >>>>> > crm_trigger_t *trig = (crm_trigger_t *) source; >>>>> > + /* Do not delay signal processing by the mainloop poll stage */ >>>>> > + if (trig->trigger) >>>>> > + *timeout = 0; >>>>> > + /* To avoid races between signal delivery and the mainloop poll >>>>> > stage, >>>>> > + * make sure we always have a finite timeout. Unit: milliseconds. >>>>> > */ >>>>> > + else >>>>> > + *timeout = 5000; /* arbitrary */ >>>>> > >>>>> > return trig->trigger; >>>>> > } >>>>> > >>>>> > >>>>> > This scenario does not let the blocked IPC off the hook, though. >>>>> > That is still possible, both for blocking send and blocking receive, >>>>> > so that should probably be fixed as well, somehow. >>>>> > I'm not sure how likely this "stuck in blocking IPC" is, though. >>>>> >>>>> Interesting, are you sure you're in the right function though? >>>>> trigger and signal events don't have a file descriptor... wouldn't >>>>> these polls be for the IPC related sources and wouldn't they be >>>>> setting their own timeout? >>>> >>>> http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs >>>> >>>> iiuc, mainloop does something similar to (oversimplified): >>>> timeout = -1; /* infinity */ >>>> for s in all GSource >>>> tmp_timeout = -1; >>>> s->prepare(s, &tmp_timeout) >>>> if (tmp_timeout >= 0 && tmp_timeout < timeout) >>>> timeout = tmp_timeout; >>>> >>>> poll(GSource fd set, n, timeout); >>> >>> I'm looking at the glib code again now, and it still looks to me like >>> the trigger and signal sources do not appear in this fd set. >>> Their setup functions would have to have called g_source_add_poll() >>> somewhere, which they don't. >>> >>> So I'm still not seeing why its a trigger or signal sources' fault >>> that glib is doing a never ending call to poll(). >>> poll() is going to get called regardless of whether our prepare >>> function returns true or not. >>> >>> Looking closer, crm_trigger_prepare() returning TRUE results in: >>> ready_source->flags |= G_SOURCE_READY; >>> >>> which in turn causes: >>> context->timeout = 0; >>> >>> which is essentially what adding >>> if (trig->trigger) >>> *timeout = 0; >>> >>> to crm_trigger_prepare() was intended to achieve. >>> >>> Shouldn't the fd, ipc or wait sources (who do call g_source_add_poll() >>> and could therefor cause poll() to block forever) have a sane timeout >>> in their prepare functions? >>> Or is it because the signal itself is interrupting some essential part >>> of G_CH_prepare_int() and friends? >>> >>>> >>>> for s in all GSource >>>> if s->check(s) >>>> s->dispatch(s, ...) >>>> >>>> And at some stage it also orders by priority, of course. >>>> >>>> Also compare with the comment above /* Sigh... */ in glue G_SIG_prepare(). >>>> >>>> BTW, the mentioned race between signal delivery and mainloop already >>>> doing the poll stage could potentially be solved by using >>> >>> Again, since nothing related to the signal source ever appears in the >>> call to poll(), I'm not seeing where the race comes from. >>> Or am I missing something obvious? >>> >>>> cl_signal_set_interrupt(SIGTERM, 1), >>> >>> This, combined with >>> >>> /* >>> * If we don't set this on, then the mainloop poll(2) call >>> * will never be interrupted by this signal - which sort of >>> * defeats the whole purpose of a signal handler in a >>> * mainloop program >>> */ >>> cl_signal_set_interrupt(signal, TRUE); >>> >>> looks more relevant. >>> But I can't escape the feeling that calling this just masks the >>> underlying "why is there a never-ending call to poll() in the first >>> place" issue. >>> G_CH_prepare_int() and friends /should/ be setting timeouts so that >>> poll() can return and any sources created by g_idle_source_new() can >>> execute. >> >> Actually, thinking further, I'm pretty convinced that poll() with an >> infinite timeout is the default mode of operation for mainloops with >> cluster-glue's IPC and FD sources. >> And that this is not a good thing :) > > Far too late, brain shutting down. > > ...not a good thing, because it breaks the idle stuff, but most of all > because it requires /all/ external events to come out of that poll() > call. > Thats why its important that "the mainloop poll(2) call will never be > interrupted by this signal" behaviour had to be fudged around by the > call to siginterrupt(). > >>>> which would mean we can condense the prepare to >>>> if (trig->trigger) >>>> *timeout = 0; >>>> return trig->trigger; >>>> >>>> Glue (and heartbeat) code base is not that, let's say, involved, >>>> because someone had been paranoid. >>>> But because someone had been paranoid for a reason ;-) > > Not sure this is an excellent example of that to be honest. > >>>> >>>> Cheers, >>>> >>>> -- >>>> : Lars Ellenberg >>>> : LINBIT | Your Way to High Availability >>>> : DRBD/HA support and consulting http://www.linbit.com >>>> >>>> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org