Re: Fairness for single-recipient bulk mail

Wietse Venema Thu, 25 Jun 2020 10:16:13 -0700

Patrik Rak:
> On Wed, 24 Jun 2020, Wietse Venema wrote:
> 
> > > > ?Problem: nqmgr's fairness breaks down for single-recipient bulk mail 
> > > 
> > > You describe the solutions but maybe you could describe the problem first?
> > 
> > My text was a bit disorganized, but the gist is that nqmgr uses a
> > clever approach to interleave (bulk mail) multi-recipient deliveries
> > with (non-bulk) single-recipient deliveries. This allows the non-bulk
> > mail to trickle out *during* bulk delivery, instead of being stuck
> > *after* the bulk delivery. I am making a simplification as nqmgr
> > will also interleave workloads with other sizes, not just
> > single-recipient and multi-recipient.
> > 
> > When the bulk mail is in the form of single-recipient messages,
> > then nqmgr cannot distinguish between bulk mail and non-bulk mail,
> > and these deliveries are no longer interleaved. Deliveries to a
> > specific destination become sequential. Again, I am making
> > a simplification, this time that all email is single-recipient.
> 
> Yep, this is a nice summary of how it works. And the below is some 
> solution, but I am still not sure which problem you are trying to solve.
> 
> (I really apologize, maybe it's been part of some prior discussion and 
> thus I am missing the bit which lead to that post. Honestly, I am not 
> trying to play dumb here.)
> 
> It's obviously some congestion bothering you. So let me ask: where 
> is the bottleneck?
> 
> 1) paricular destination is congested
> 2) all delivery agents are congested
> 3) in-memory queue is congested
> 4) something else?


This came up when people asked for a way to prevent the degenerate
case that I described: non-bulk mail getting stuck behind bulk mail.

As long as the incoming queue is not entirely in FIFO order, then
the active queue will contain a mix of messages from different
senders, and the scheduler can interleave their deliveries.

> and what is the "blocker" attribute?
> 
> A) particular mail recipient is the hog
> B) destination itself is the hog (not happening AFAIK)
> C) particular mail sender is the hog
> D) particular mail is the hog
> E) single recipient mail blocking multi recipient mail (not happening AFAICT)
> F) multi recipient mail blocking single recipient mail (not happening AFAICT)
> G) something else?

An active queue with single-recipient messages, mostly but not all
bulk mail (see incoming queue assumptions above). Assuming similar
next-hop domain distributions, non-bulk mail remains stuck behind
bulk mail. If the bulk and non-bulk domain distributions are wildly
different then round-robin will take care of interleaving.

Bulk mail has multiple messages per sender. Non-bulk mail has a
small number of messages per sender. Sender is simplified, 
for example an email address without address etension.

> From the rest of the following mails I have gathered that it may be the 
> same sender blocking things (which may be either delivery agents or 
> in-memory queue, so 2C or 3C, I am not really sure) is what is bugging 
> you. But that's what confuses me even further because separate 
> per-mail-recipient-count classes don't help with that, do they? (If all 
> you get in your system is single recipient mail, this type of classes 
> doesn't change a thing.)

Yes it does help, by delivering non-bulk and bulk in parallel (and
with sender round-robin it would deliver bulkk mail from different
senders in parallel as well).  Still assuming the the active queue
will contain a mix, if there are any non-bu;lk messages.

        Wietse

> But if that's really the case (the single sender usurping everything), why 
> complicate things? Rather than dealing with that during the outbound 
> selection, I'd say cut it during the inbound phase. Don't even let it into 
> the system. Simply don't pickup mail from sender who already has too many 
> recipients in in-memory queue during the active queue scan. It prevents it 
> from clogging the queue or even the agents (depending on the limit), is 
> trivial to do, and requires no change to the scheduling algorithm itself.
> 
> > The idea then is to have three FIFO classes with delivery requests
> > that are round-robin interleaved by design: one class for senders
> > with, say, 1-20 pending delivery requests, one class for senders
> > with 20-400, and one class for senders with more than 400 pending
> > delivery requests. The scheduler has up to 20,000 delivery requests
> > in memory.
> > 
> > For a given transport and destination, if all three classes have
> > work then round-robin interleaving gives each class gets 1/3 of the
> > delivery slots, 1/2 if there are only two classes with work, all
> > slots if there is only one class witth work, and zero if there is
> > no work.
> > 
> > If implemented in oqmgr, it's brain-dead round-robin with FIFO.
> > What could possibly go wrong? :-) Well it does not interleave
> > deliveries within the same class. But, with single-recipient messages,
> > that wasn't happening anyway, so we are not making things worse.
> 
> Let's put aside that I don't know what you are solving. Regardless of that 
> what you suggest has it's own share of problem. You either have too little 
> classes, or too many (David pointed at this as well, if I understood it 
> correctly within the context I have so far). Each suck in a different way.
> 
> 1) Too little classes - in which class will 100 recipient mail fit? Will 1 
> recipient mail have to wait until several 100 recipient mails before it 
> are delivered? Or will 100 recipient mail wait until few 1000 recipient 
> mails before it are delivered? No matter how you set the thresholds, with 
> too little classes it will always be too coarse and will cause substantial 
> delays.
> 
> (For comparison, nqmgr has kind of implicit log2(n) classes, even if their 
> boundaries may be a bit more fuzzy)
> 
> 2) Too many classes - let's assume you will have log2(n) classes (similar 
> to what nqmgr implicitly has). It seems to work fine, but actually that's 
> where the round robin hits us hard - you inflate the delivery of some 
> mails beyond reasonable. Let's say you have 16 classes, all delivering 
> mail. With round robin, you are inflating delivery of mail in each class 
> by deliveries of all other classes. This means each mail can be 
> extended 16 times. As you put it some 20+ years ago when I was proposing 
> round robin for the first time - first recipients of some mails all arrive 
> today while last recipients of the same mails arrive tomorrow. It sucked 
> then and I believe it still sucks today.
> 
> (For comparison, nqmgr never inflates any mail delivery more than twice, 
> even with the most aggressive preemtpion settings enabled).
> 
> So, this is why I think what you propose is not the way to go. 
> Unfortunately, I don't think there is any sweet spot of "just the right 
> amount of classes" which would work wonderfully. I am afraid it would 
> suffer from the worst of both cases, being too coarse to delay too much of 
> shorter mails while inflating some deliveries way too much at the same time.
> 
> > > I don?t remember any of it now but maybe I wrote the most important
> > > ideas somewhere, so I can have a look.
> > 
> > I remember we discussed classes, but I don't think that I was able
> > to reduce the concept to a combination of really brain-dead simple
> > concepts.
> > 
> > I think it would work (in oqmgr), I just need to find time to do this.
> 
> Well, I don't want to discourage you, but the above is why I am rather 
> skeptic about the result.
> 
> FWIW, in the mean time, I had a look and found my notes from July 2013, 
> when we were last discussing the classes. It's quite interesting read (for 
> me, at least). It contains various notes for the arguments I prepared for 
> our future discussions, as well as the solution to the problem we were 
> solving back then.
> 
> It turns out we were indeed discussing the classes as well, but those were 
> "slow" and "fast" classes instead. My biggest gripe with nqmgr (or oqmgr, 
> for that matter) is that on a saturated system, deliveries of "slow" mail 
> to "slow" destinations eventually hog down most of the delivery agents, 
> considerably slowing any other, both "slow" and "fast", mail sitting in 
> the queue.
> 
> We were trying to come up with a solution that would solve this by using 
> the mail size as an attribute somehow, and dedicating some portions of the 
> agents to certain classes. After some serious thinking my revelation was 
> that to solve this properly, we should in fact measure the time the mail 
> spends "on the wire" (effectively measuring per-destination throughput), 
> and use this information to drive the delivery agent assignment decisions. 
> It's quite obvious in the hindsight, but that's something which we have 
> never discussed before. The solution I had back would work marvelously 
> even with a single agent, but in the end I had never pursued this further 
> to turn it into an actual implementation.
> 
> I attach the notes for your perusal, maybe you'll find some of it 
> interesting, too. It's quite brief to be generally useful as is, but if 
> something piques your interest, I can try to remember and turn it into 
> something more understandable.
> 
> BTW, one section even discusses the explicit vs implicit preemption, which 
> is basically the class based round robin being discussed above. :)
> 
> Best,
> 
> Patrik
Content-Description: 

[ Attachment, skipping... ]

Re: Fairness for single-recipient bulk mail

Reply via email to