J.A. Terranson wrote: > >On Sat, 20 Dec 2008, Mark Sapiro wrote: > >> mailmanctl stop should have stopped the last instance started, but yes, >> it isn't going to stop everything in this situation. > >Would a killall type of functionality be contraindicated in mailmanctl >stop?
It would have to know what processes to kill. The masters (mailmanctl processes) know which runners they started, but when you signal a master with bin/mailmanctl whatever, the specific master that get's signaled is the one who's PID is in the data/master-qrunner.pid file in *this one's* var_prefix. It could detect in a possibly OS dependant way that there are other masters running, but it can't know that they aren't from other disjoint Mailman instances on the same server, so it doesn't know if they should be SIGTERMd or not. When a "duplicate" mailmanctl is started in a way that overrides the locks (or the script removes the locks first - I've seen scripts like that), it overwrites the data/master-qrunner.pid file and the first PID is lost. I suppose data/master-qrunner.pid could be converted to a stack of PIDs, but that's just a way of recovering from a situation that shouldn't occur in the first place. Actually, I think there is an issue in that mailmanctl -s start is only supposed to ignore the lock if it was created by a PID which is no longer running, and I'm not sure that code is correct. I have to look at it some more. >> >The files referenced were nowhere to be found, so picking them apart is a >> >non starter. Looks like a race condition: Does mailman not check to see >> >if it's already running? >> >> It does unless it is forced not to. The issue is that the check is via >> lock files and init scripts tend to force override of the checks on >> the theory that any lock files are residue from a prior boot. > >We do not use -s on the init script. How did 5 sets of mailmanctl and qrunners get started? Have you figured out how that happened? >> Are you saying that fixing the multiple qrunner/Mailman instance issue >> solved the missing mail problem? I'd be very surprised if that were >> the case. > >Yes. It appears to have completely resolved it. Well, as I said I'm surprised. I'm glad it is resolved, but I'm at a loss to explain why having multiple runners serving the same queue entries would cause non-delivery of a post to a subset of the list members. Apparently it either did or there was some other issue that was fixed by stopping everything and restarting. >> Also, you might do >> >> bin/list_members --regular --nomail=enabled ccm-l | grep -i missing_adr > >Returns a null Perhaps you misunderstood. 'missing_adr' was supposed to be the address (yours) that wasn't being delivered. If it was that in the above, that means the address is not a regular member with delivery enabled so it shouldn't be receiving posts. >> just to be sure. >> >> Then check Mailman's smtp log for an entry like >> >> Dec 20 08:39:58 2008 (30746) <message-id> smtp to ccm-l for nnn recips, >> completed in t.ttt seconds > ><system brought up from maintenance> > >Dec 20 05:23:34 2008 (1368) <mailman.0.1229772212.558.cc...@ccm-l.org> >smtp to ccm-l for 1 recips, completed in 0.611 seconds This is a Mailman generated notification of some kind. >Dec 20 06:35:59 2008 (1368) <49550272967245c1a49355a8953d0...@pandesk> >smtp to med-jokes for 157 recips, completed in 23.833 seconds > ><snip> > ><somewhere in here is where I hand killed all the processes and restarted >mailman> > >smtp to med-events for 103 recips, completed in 14.170 seconds >Dec 20 12:11:05 2008 (5688) <c39.49b55b86.367d8...@aol.com> smtp to >med-jokes for 157 recips, completed in 11.838 seconds >Dec 20 12:11:28 2008 (5688) ><68fd2c7c0812200550q32808396vf875b6ec66f9...@mail.gmail.com> smtp to >med-jokes for 156 recips, completed in 22.845 seconds > >157==correct, but one is unreachable right now due to cable cut. More likely, the <c39.49b55b86.367d8...@aol.com> post was sent by Mailman to all 157 members and the <68fd2c7c0812200550q32808396vf875b6ec66f9...@mail.gmail.com> post was a reply that had the OP in To: or Cc: so Mailman didn't send to that address and only sent to the other 156. The unreachable address should be delivered by Mailman to the MTA and only detected by the MTA when it attempts delivery. If the MTA is actually checking whether the address is deliverable during Mailman's SMTP to the MTA, Mailman's performance will suffer greatly. Plus, there would be something for this address in Mailman's smtp-failure log. >Dec 20 12:11:30 2008 (5688) ><mailman.0.1229796653.5686.med-jo...@ccm-l.org> smtp to med-jokes for 1 >recips, completed in 1.341 seconds >Dec 20 12:11:31 2008 (5688) ><mailman.0.1229796662.6005.med-jo...@ccm-l.org> smtp to med-jokes for 1 >recips, completed in 1.532 seconds >Dec 20 12:11:33 2008 (5688) ><mailman.0.1229796681.6055.med-jo...@ccm-l.org> smtp to med-jokes for 1 >recips, completed in 1.618 seconds >Dec 20 12:12:03 2008 (5688) ><mailman.0.1229796721.6065.med-jo...@ccm-l.org> smtp to med-jokes for 1 >recips, completed in 0.603 seconds >Dec 20 12:12:04 2008 (5688) ><mailman.1.1229796721.6065.med-jo...@ccm-l.org> smtp to med-jokes for 1 >recips, completed in 0.551 seconds These 5 are all Mailman notices. >< note that there are zero entries for CCM-L up to this point, despite >archives to the contrary, and replies which show distribution to users >[but not to poor old me :-(] > This says there was some problem with OutgoingRunner. I don't know what it would be, but if it is operating correctly, it will write the smtp log and the post log for every post it processes. Presumably it was sending posts because some people were receiving them (It seems unlikely that the list replies would all have come in response to off-list Ccs). But if it is sending posts and not logging them, it's messed up somehow. I suppose it could be due to a race condition between multiple runners even though I don't understand exactly how, but why just one list? <log entries snipped> > >Etc. all seems pretty normal right now. > >> have to look at the MTA log to see what happened to the missing >> recipient(s). > >I did that (but did not mention it, as...), but it never made it to the >MTA for my address. > >WTH? I can think of no possible way for just one address of no special >significance other that it is also listowner to archive as if delivered, >but never to make it to the MTA or even to be logged by mailman... I'm equally mystified. -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9