As Postfix maintainer, my interest is to provide a system that meets
a wide range of needs. At the same time the system also has to be
implementable and maintainable. This means that some functionality
will not be implemented, no matter how desirable it might be. The
goal is to find a set of robust mechanisms that provides the most
bang for the buck. Not: to find a set of mechanisms that is perfect.
Perfection can be expensive and fragile.

In the context of this thread, the idea is that new-mail delivery
requests complete in less time on average than deferred-mail delivery
requests (for example because some deferred-mail requests time out),
and that therefore new-mail delivery performance can be improved
by separating new mail from deferred mail, and/or by somehow
prioritizing new mail over deferred mail.

Below is a list (not complete) of causes for mail delays.  I also
list non-invasive changes that extend existing functionality
incrementally, and that are unlikely to require lengthy verification
or complicate future developments.

I'm not listing solutions that change nqmgr scheduling within a
per-destination queue. Cool as it might be to also group jobs by
criteria such as sender or incoming/deferred queue, support for
multi-criteria job grouping would have a high implementation and
maintenance cost.

And I also do not list solutions that depend on the cause of delays:
slow or quick handshake failure (for example SMTP site-fail) versus
slow or quick failure after handshake (for example SMTP message-fail
or SMTP recipient-fail).  This would require invasive change: the
current infrastructure distinguishes only between "handshake failure"
(i.e.  dead site) and "failure after handshake" but does not account
for the amount of time (though the implementation and maintenance
cost would be less than multi-criteria job grouping in the nqmgr
scheduler).

No solution should be rejected because it is imperfect (i.e. there
exists a high-cost change that covers some part of the problem
space, or that covers it better than some other change).  Keep in
mind the goal, to find a set of robust mechanisms that provides the
most bang for the buck.  Not: to find a set of mechanisms that is
perfect.

Bottlenecks:

- Receiver back-pressure. Make bilateral arrangements.

- Network capacity. Get a bigger pipe or move closer.

- Delivery agents. Increase output and/or process input selectively.
Specifically, configure more delivery agents, and/or separate mail
streams so that one aggregate stream (deferred mail) can't starve
other aggregate streams.

    For example, when delivery agents are saturated with deferred
    mail, introduce "slow path" / "fast path" delivery. Add a
    trivial-rewrite "slow path" personality for deferred mail, and
    use the existing ("fast path") personality for new mail.  This
    is a low-cost change that allows deferred mail to use different
    transports (with different concurrencies, timeouts, etc.).

    This idea generalizes to other aggregates, such as messages
    from the same sender, from the same client, messages larger
    than some threshold, and so on.  For that we could let the
    administrator decide a many->2 mapping from sender, client, or
    size to slow/fast path.  Initially, the mapping could be based
    only on incoming versus deferred queue.

    Based on this mapping the queue manager can propagate back
    pressure to the incoming/deferred queue-scanning code (see
    next).

- Active queue. Process input selectively.

    For example, when the active queue becomes congested with "slow
    path" mail and the above measures are exhausted, prioritize the
    queue scan so that "slow path" mail cannot starve other mail.

- Queue manager. Process address resolution requests in parallel
(event-driven resolver client). This could increase the intrisic
limit (2500 msg/sec in ~2007) further.

- Disk I/O. Use a solid-state disk, or use a RAID with lots of
battery-backed cache.

        Wietse

Reply via email to