On Sun, May 12, 2013 at 02:52:05PM -0400, Wietse Venema wrote:
> Please consider not hard-coding your two-class solution to new/deferred
> mail only, but allowing one level of indirection so that we can
> insert a many-to-2 mapping from message property (now: from queue
> to delivery class; later: sender, client or size to delivery class).
>
> The idea is that some part of Postfix may clog up due to mail with
> properties other than having failed the first delivery attempt.
Since we're addressing congestion caused by slow mail, perhaps
we're going about it the wrong way. The heuristic that deferred
(or selected via some other a-priori characteristic) mail is likely
slow is very crude approximation, and may be entirely wrong.
Instead, I think we can apply a specific remedy for the actual
congestion as it happens.
- Enhance the master status protocol to add a new state in addition
to busy and idle:
* Blocked. The delivery agent is busy, but blocked waiting to
complete connection setup (the "c" time in delays=...).
The SMTP delivery agent would enter this state at the beginning
of the delivery attempt, and exit it before calling smtp_xfer(),
when another session will be attempted to deliver tempfailed
recipients, the state is re-entered at the completion of
smtp_xfer().
- Add two companion parameters:
# Typically higher than the process limit by some multiple.
#
default_blocked_process_limit = 500
# Processes blocked for more than this time yield their slot
# to new processes, dynamically inflating the process limit.
#
blocked_process_time = 5s
When a process stays blocked for more than blocked_process_time,
master(8) decrements the service busy count and increments
the service blocked count, provided the maximum blocked count
has not been reached. This allows master(8) to create more
processes to handle mail that is not slow.
When a delivery agent that has been blocked for more than
blocked_process_time completes a delivery, it does not
go back to the accept loop. Rather it exits. The process
start-up cost is ammortized by the long delay.
- The master.cf maxproc column is optionally extended to allow
setting both the process limit and the blocked process limit.
# service type private unpriv chroot wakeup maxproc command + args
smtp unix - - n - 200/900 smtp
The syntax is "m[/n]" where m is the process limit or "-" for default,
and "n" is the blocked process limit or is not specified for default.
This directly addresses process starvation via slow processes, and
does not require any queue manager changes (the queue manager is the
most expensive to support with complex features).
--
Viktor.