[Mailman-Developers] Threads and robustness against runner crashes

Stephen J. Turnbull Mon, 04 Mar 2024 08:01:46 -0800

Split thread #2.

Justus Winter writes:


 > >  > Here are the things I did so far:
 > >  > 
 > >  >   - I have Mailman running with runners in threads instead of
 > >  >     processes, but that is in a proof-of-concept stage at this
 > >  >     point and needs some cleaning up
 > >
 > > After working with Mailman 3 and Postfix, I've become fond of
 > > the HUPD (HUPD of Uncontrolled Proliferation of Daemons) model
 > > of application design, at least for email.
 > 
 > My prototype let's you chose, for every kind of runner, whether to
 > use the process or thread model

That's not a sales point, as far as I'm concerned.  It adds complexity
for the installer and the site manager, as well as in the code.

 > I don't quite buy (or maybe I'm not understanding the whole picture)
 > into the argument that having individual processes improves the
 > robustness of the whole system.

I'm talking about the developer/maintainer experience, not about run
time.

 > From my experience, having individual runners killed can render
 > Mailman unusable [0] (and to my then untrained eye it was
 > impossible to see that a runner was missing,

That's some combination of documentation, logging, and tooling bugs.
At the very least "mailman status" should report whether all the
runners that were started are still present (it doesn't at present).

It's really not hard to detect a crashed or stalled runner, even in a
sliced (multirunner) queue -- queuefiles start to pile up.  (By "not
hard" I mean you can use "ls" or "du", not that it should be obvious
what to do.)

 > if on the other hand Mailman would have been a single process, or a
 > significantly smaller number of processes, a single missing process
 > would have been more apparent),

True, but to me crashes in a monolithic program are less acceptable,
expecially threaded, because other concurrent operations may depend on
that program staying alive.  The way exception handling is done in
Mailman 2 with a big "except Exception" around the whole program, you
mostly would not get a crash at all, just a log message with an
traceback, probably unintelligible to a non-developer of Mailman.  Not
clear that's a win over the current situation for you.  Sure, you can
probably arrange for exception handling to be per-thread in some
sense, but that's going to be conceptually harder than the the "log
the exception, let it crash, have the master restart it and pray"
approach we use in the multiprocess model.

 > and when a runner has picked up a mail from a queue, and then
 > crashes, that mail is lost forever (i.e. runner operations are not
 > atomic).

Please report such incidents in as much detail as you can.  The whole
point of "store and forward" is to prevent that.  Runners should not
alter the queuefile until they're done.  If they crash in the middle,
they should leave the queuefile they received and maybe a work file.

_______________________________________________
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9

[Mailman-Developers] Threads and robustness against runner crashes

Reply via email to