Its odd, I could have sworn the slicing used to be done per recipient, not per message. I've had to check logs for a client to confirm her messages went out, and generally had check all three machines to verify every user received the message.

The rest of the process works as I expected it to. I think were I flubbed up initially (last time I thought it wasn't working right and went through my process again) was that I had applied the patch to Switchboard.py to all four machines. Its just weird that it took so many restarts (and some reboots) before it started slicing properly again. I haven't made any config changes since I sent my initial plea for help to the list last night.

At least now it looks like I have my notes in order and I can finish getting rid of the other ubuntu machines. Thanks for the help!


On 05/24/2014 04:56 PM, Mark Sapiro wrote:
On 05/24/2014 03:05 PM, Jeff Taylor wrote:
After stopping mailman, machine #1 shows:
May 24 15:23:34 2014 (11512) Master qrunner detected subprocess exit
(pid: 11516, sig: None, sts: 15, class: IncomingRunner, slice: 1/3)

Machine #2:
May 24 15:21:56 2014 (12767) Master qrunner detected subprocess exit
(pid: 12769, sig: None, sts: 15, class: BounceRunner, slice: 2/3)

Machine #3:
May 24 15:22:16 2014 (31849) Master qrunner detected subprocess exit
(pid: 31858, sig: None, sts: 15, class: VirginRunner, slice: 3/3)

OK, that looks good.


Now for even more strangeness...  After restarting mailman I sent
another test message.  Just so you know, my test list has three email
addresses in it, so I would expect the messages to get split up
generally between the three machines (and please confirm my
understanding... if the list has three users on it, each one of the
three machines should forward one message to one user from the list?).

No. That's not the way it works. See below.


However after restarting and sending 7 more tests, it seems to bounce
between machine #1 and #2 sending the messages.  In each case, one
machine sends the message to ALL users.  After waiting about 15 minutes
I sent several more test messages.  Now it seems to be randomly picking
one of the three machines to send from, but again the copy to all users
is sent from that one machine.  I suppose that is better than it was --
at least now all three machines are being used.  Is this the way its
supposed to be working?

I think so.

Here's the detail. First the general flow.

1) A post arrives and is queued in the in/ queue.
2) It is picked up by IncomingRunner and processed through the handler
pipeline.
3) Assuming it is not held for any reason, it will get queued in the
archive/ queue for ArchRunner and in the out/ queue for OutgoingRunner.
It will also be added to the list's digest.mbox for eventually being
sent to digest members as part of a digest which will be created and
queued in the virgin/ queue for VirginRunner which will ultimately queue
it in out/ for delivery
4) ArchRunner will pick up the message from the archive/ queue and
archive it.
5) OutgoingRunner will pick up the message from the out/ queue and
deliver it to the recipients.

Before we look at slicing, we see that once OutgoingRunner has a
message, it will deliver it to all it's recipients, so a single post
will always be delivered from the one machine who's OutgoingRunner
picked it up from the out/ queue.

Now for slicing. Whenever a message is queued, whether for the in/ queue
by mail delivery or some other queue by some handler or other process,
it gets a file name of the form tttt+hhhhhhhh.pck. the tttt part is a
time stamp so we can ensure fifo processing. The hhhhhhhh part is a hex
digest of a sha1 hash of the message, the listname and the current time.
Slicing works by dividing that hash space into n equal slices (in your
case 3 with slice 0 being the first third, slice 1 the middle third and
slice 2 the last third).

So when a runner that is processing slice 0 say, looks at its queue, it
will only process those messages in the first third of the hash space.

So bottom line, an incoming message will be queued in the in/ queue and
it has an equal chance of being in any slice and will be picked up by
the machine processing that slice. Then the message will be later
requeued in out/ probably with a different hash. The time is in seconds
and may or may not have changed, but the message has likely changed due
to subject prefixing, content filtering and/or header refolding. So it
will be picked up by the OutgoinRunner processing its slice, and that
one runner will deliver to all recipients.


Regarding the upgrade version, its been too long, I'm afraid I don't
know what the old version was.  The old machines are running ubuntu
oneiric and now have mailman 2.1.14.  The newer machines have debian
wheezy and mailman 2.1.15.  The upgrades happened a few months back, but
I only noticed the issue yesterday because I am trying to get rid of the
ubuntu machines and replace them with the debian machines.  The messages
have been getting delivered, but apparently one machine was handling
everything.

I was curious because it would help me know if there had been relevant
changes, but I think it's working as it's supposed to and probably as it
wads before.


------------------------------------------------------
Mailman-Users mailing list [email protected]
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Reply via email to