Hi all,
Postfix 3.7.11 on Debian 12, two-node outbound setup behind a
milter
pipeline. The two nodes are similarly provisioned but carry very
different
outbound load. Both nodes run identical config.
Symptom (reproducible, observed multiple times today):
On the busy node (typically 60-90 messages/min outbound, queue
~10-50),
lowering default_destination_rate_delay from 1s to 500ms (valid
time(5) syntax) and reloading causes the smtp transport to
stall:
- status=sent count drops to 0 within ~30s
- queue grows by 20-30 messages
- active postfix/smtp processes drop to 0
- no 421/450 4.4.x or 4.7.x rejections in the log
- no qmgr fatal/exit-status messages with the valid 500ms
value
- postfix is "active" per systemctl, master+qmgr+pickup
running
Reverting to 1s + reload: throughput recovers within 60-90s,
queue
drains.
The same change applied on the low-volume node (queue ~0-2) does
not
exhibit a visible stall. We can also reproduce qmgr fatal-exit
on
either node by passing a syntactically invalid value
(default_destination_
rate_delay=0.5s) - that one is documented (time(5) wants integer
+ unit,
500ms is the right sub-second form). The stall I'm asking about
is with
the VALID 500ms value, which is accepted by postconf and qmgr
starts
clean.
Hypotheses we've ruled out:
- cohort_failed_limit asymmetry (matched default_* and smtp_*
to 10)
- downstream throttling (no 4xx push-back observed)
- DNS / TLS / connectivity (1s -> 500ms is the only delta)
- syntax error (postconf accepts 500ms, qmgr starts without
fatals)
What I'd like to understand:
1. Is sub-second default_destination_rate_delay safe to use
under
sustained load on a queue that already has tens-to-hundreds
of
active recipient destinations? Or is there a load-dependent
interaction with qmgr's per-destination scheduling state
that
makes anything below 1s effectively a no-op (or worse) on a
loaded node?
2. If sub-second is supported, is there a related parameter
(recipient_refill_delay, queue_run_delay,
transport_rate_delay,
concurrency_negative_feedback) whose default I should be
adjusting in tandem to make 500ms behave as expected?
3. Is there a recommended way to inspect qmgr's
per-destination
scheduler state at runtime to confirm the stall hypothesis?
postconf -n attached below. Happy to share the full diagnostic
appendix (commands tried, before/after metrics, alternate-config
attempts) off-list or in a follow-up if helpful.
Thanks,
JF
mail_version = 3.7.11
postconf -n:
alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
anvil_rate_time_unit = 60s
append_dot_mydomain = no
biff = no
bounce_queue_lifetime = 1h
compatibility_level = 3.6
default_destination_concurrency_failed_cohort_limit = 10
default_destination_concurrency_limit = 5
default_destination_rate_delay = 1s
default_destination_recipient_limit = 50
inet_interfaces = all
inet_protocols = ipv4
initial_destination_concurrency = 2
maximal_backoff_time = 21600s
maximal_queue_lifetime = 4d
message_size_limit = 26214400
milter_connect_timeout = 30s
milter_content_timeout = 300s
milter_default_action = accept
milter_protocol = 6
minimal_backoff_time = 900s
notify_classes = resource, software
queue_run_delay = 300s
smtp_destination_concurrency_failed_cohort_limit = 10
smtp_destination_concurrency_negative_feedback = 1
smtp_destination_concurrency_positive_feedback = 1
smtp_tls_CApath = /etc/ssl/certs
smtp_tls_loglevel = 1
smtp_tls_security_level = may
smtp_tls_session_cache_database =
btree:${data_directory}/smtp_scache
smtpd_banner = mail.lastspam.com ESMTP
smtpd_client_connection_rate_limit = 200
smtpd_client_message_rate_limit = 500
smtpd_client_recipient_rate_limit = 200
smtpd_helo_required = yes
smtpd_helo_restrictions = permit_mynetworks,
reject_invalid_helo_hostname,
reject_non_fqdn_helo_hostname
smtpd_milters = inet:127.0.0.1:11332, inet:127.0.0.1:8899
smtpd_recipient_restrictions = permit_mynetworks,
reject_non_fqdn_recipient,
reject_unknown_recipient_domain, reject_unauth_destination
smtpd_relay_restrictions = permit_mynetworks,
reject_unauth_destination
smtpd_sender_restrictions = reject_non_fqdn_sender,
reject_unknown_sender_domain
smtpd_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
smtpd_tls_loglevel = 1
smtpd_tls_security_level = may
_______________________________________________ Postfix-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
