After doing some upgrades, I noticed yesterday that my multi-machine setup is no longer properly slicing the queue between machines. I probably missed something, but after going through all my notes on the setup I cannot figure out what the problem in. Hopefully someone else can spot the issue?

I have four mail servers. Three of them are supposed to slice the queue between them, and the fourth machine is set as a backup to process any remaining messages after 2 minutes. On the three slice machines, I have patched mailmanctl as:
----------
def start_all_runners():
    kids = {}
>>>
    for qrname, count, machine, nummachines in mm_cfg.QRUNNERS:
        for slice in range(machine, count, nummachines):
<<<
            # queue runner name, slice, numslices, restart count
            info = (qrname, slice, count, 0)
            pid = start_runner(qrname, slice, count)
            kids[pid] = info
    return kids
----------

Each of these machines has a QRUNNERS section added to mm_cfg.py which defines the slice of each machine -- 3,0,3 / 3,1,3 / 3,2,3
and contains the line: QRUNNER_MESSAGE_IS_OLD_DELAY = None

On the fourth (backup) machine, I have patched Switchboard.py as:
----------
            if ext <> extension:
                continue
            when, digest = filebase.split('+')
>>>
            now = time.time()
            age = now - float(when)
            # Only process defined 'old' entries.
            if not (
                hasattr(mm_cfg, 'QRUNNER_MESSAGE_IS_OLD_DELAY') and
                mm_cfg.QRUNNER_MESSAGE_IS_OLD_DELAY and
                age > mm_cfg.QRUNNER_MESSAGE_IS_OLD_DELAY):
                continue
<<<
            # Throw out any files which don't match our bitrange. BAW: test
            # performance and end-cases of this algorithm.  MAS: both
            # comparisons need to be <= to get complete range.
----------

On this fourth machine I have added to mm_cfg.py: QRUNNER_MESSAGE_IS_OLD_DELAY = minutes(2) This machine has NOT had the slices patch added to mailmanctl, so there is no QRUNNERS section in mm_cfg.py.


OK, so if I only have the backuo machine running, mailman will deliver my test message after 2 minutes. That part works fine. However with the three slice machines running, the first machine (3,0,3) sends ALL of the messages out immediately. If I shut down the first machine and leave the other two running, no messages are sent out until after the 2-minute period, then the backup machine sends them. In other words, the queue is not being sliced, and only the first machine is capable of sending out list messages.

I have referenced back to the original article on this subject: https://mail.python.org/pipermail/mailman-users/2008-March/060753.html but it appears I did the correct changes. Has something changed in newer versions of mailman that now prevent this technique from working the same way? Or was there something more to getting slicing to work that was not mentioned in that article?
------------------------------------------------------
Mailman-Users mailing list [email protected]
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Reply via email to