Hi all,

  Postfix 3.7.11 on Debian 12, two-node outbound setup behind a milter
  pipeline. The two nodes are similarly provisioned but carry very different
  outbound load. Both nodes run identical config.

  Symptom (reproducible, observed multiple times today):

  On the busy node (typically 60-90 messages/min outbound, queue ~10-50),
  lowering default_destination_rate_delay from 1s to 500ms (valid
  time(5) syntax) and reloading causes the smtp transport to stall:

    - status=sent count drops to 0 within ~30s
    - queue grows by 20-30 messages
    - active postfix/smtp processes drop to 0
    - no 421/450 4.4.x or 4.7.x rejections in the log
    - no qmgr fatal/exit-status messages with the valid 500ms value
    - postfix is "active" per systemctl, master+qmgr+pickup running

  Reverting to 1s + reload: throughput recovers within 60-90s, queue
  drains.

  The same change applied on the low-volume node (queue ~0-2) does not
  exhibit a visible stall. We can also reproduce qmgr fatal-exit on
  either node by passing a syntactically invalid value (default_destination_
  rate_delay=0.5s) - that one is documented (time(5) wants integer + unit,
  500ms is the right sub-second form). The stall I'm asking about is with
  the VALID 500ms value, which is accepted by postconf and qmgr starts
  clean.

  Hypotheses we've ruled out:
    - cohort_failed_limit asymmetry (matched default_* and smtp_* to 10)
    - downstream throttling (no 4xx push-back observed)
    - DNS / TLS / connectivity (1s -> 500ms is the only delta)
    - syntax error (postconf accepts 500ms, qmgr starts without fatals)

  What I'd like to understand:

    1. Is sub-second default_destination_rate_delay safe to use under
       sustained load on a queue that already has tens-to-hundreds of
       active recipient destinations? Or is there a load-dependent
       interaction with qmgr's per-destination scheduling state that
       makes anything below 1s effectively a no-op (or worse) on a
       loaded node?

    2. If sub-second is supported, is there a related parameter
       (recipient_refill_delay, queue_run_delay, transport_rate_delay,
       concurrency_negative_feedback) whose default I should be
       adjusting in tandem to make 500ms behave as expected?

    3. Is there a recommended way to inspect qmgr's per-destination
       scheduler state at runtime to confirm the stall hypothesis?

  postconf -n attached below. Happy to share the full diagnostic
  appendix (commands tried, before/after metrics, alternate-config
  attempts) off-list or in a follow-up if helpful.

  Thanks,
  JF  

  mail_version = 3.7.11

  postconf -n:
  alias_database = hash:/etc/aliases
  alias_maps = hash:/etc/aliases
  anvil_rate_time_unit = 60s
  append_dot_mydomain = no
  biff = no
  bounce_queue_lifetime = 1h
  compatibility_level = 3.6
  default_destination_concurrency_failed_cohort_limit = 10
  default_destination_concurrency_limit = 5
  default_destination_rate_delay = 1s
  default_destination_recipient_limit = 50
  inet_interfaces = all
  inet_protocols = ipv4
  initial_destination_concurrency = 2
  maximal_backoff_time = 21600s
  maximal_queue_lifetime = 4d
  message_size_limit = 26214400
  milter_connect_timeout = 30s
  milter_content_timeout = 300s
  milter_default_action = accept
  milter_protocol = 6
  minimal_backoff_time = 900s
  notify_classes = resource, software
  queue_run_delay = 300s
  smtp_destination_concurrency_failed_cohort_limit = 10
  smtp_destination_concurrency_negative_feedback = 1
  smtp_destination_concurrency_positive_feedback = 1
  smtp_tls_CApath = /etc/ssl/certs
  smtp_tls_loglevel = 1
  smtp_tls_security_level = may
  smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
  smtpd_banner = mail.lastspam.com ESMTP
  smtpd_client_connection_rate_limit = 200
  smtpd_client_message_rate_limit = 500
  smtpd_client_recipient_rate_limit = 200
  smtpd_helo_required = yes
  smtpd_helo_restrictions = permit_mynetworks, reject_invalid_helo_hostname,
  reject_non_fqdn_helo_hostname
  smtpd_milters = inet:127.0.0.1:11332, inet:127.0.0.1:8899
  smtpd_recipient_restrictions = permit_mynetworks, reject_non_fqdn_recipient,
  reject_unknown_recipient_domain, reject_unauth_destination
  smtpd_relay_restrictions = permit_mynetworks, reject_unauth_destination
  smtpd_sender_restrictions = reject_non_fqdn_sender, reject_unknown_sender_domain
  smtpd_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
  smtpd_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
  smtpd_tls_loglevel = 1
  smtpd_tls_security_level = may

_______________________________________________
Postfix-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to