Re: [exim] stuck exim processes

Jeremy Harris via Exim-users Wed, 16 Feb 2022 05:17:28 -0800

On 16/02/2022 08:08, Martin Waschbüsch via Exim-users wrote:

07:45:34.638 57451 sync_responses expect rcpt for recipi...@recipientdomain.tld
07:45:34.638 57451   SMTP<< 550 5.1.1 <xn--recipient-...@recipientdomain.tld>: 
Recipient address rejected: User unknown


far end rejected RCPT.  We had pipelined MAIL, RCPT, BDAT, on a TLS conn.

07:45:34.638 57451 look for one response for BDAT
07:45:34.638 57451   SMTP<< 554 5.5.1 Error: no valid recipients


follow-on reject for the BDAT, reasonable given the RCPT situation

07:45:34.638 57451 ok=0 send_quit=1 send_rset=1 continue_more=0 yield=0 
first_address is NULL


we think we're done with this message, and with the connection

07:45:34.638 57451   SMTP+> QUIT


set up the QUIT command

07:45:34.638 57451 cmd buf flush 6 bytes (more expected)
07:45:34.638 57451 tls_write(0x8011111a0, 6, more)
07:45:34.638 57451 tls_write((nil), 0)
07:45:34.638 57451 SSL_write(0x801600000, 0x801050768, 6)
07:45:34.638 57451 outbytes=6 error=0
07:45:34.638 57451   SMTP(TLS shutdown)>>


pipeline it with our side's TLS close notification

07:45:34.638 57451 SSL3 alert write:warning:close notify
07:45:34.638 57451   SMTP(shutdown)>>


Tell the socket we'll send no more data; this results in a TCP FIN send
(leaving the TCP conn in "half-closed" state; the peer could still
send data - and in fact ought to: an SMTP response to the QUIT within
the TLS session, followed by a TLS shutdown)

07:45:34.643 57451 Calling SSL_read(0x801600000, 0x8011101a0, 4096)
07:45:34.643 57451 read response data: size=15
07:45:34.643 57451   SMTP<< 221 2.0.0 Bye


Peer sent the response for QUIT, as expected

07:45:34.643 57451 Calling SSL_read(0x801600000, 0x8011101a0, 4096)
07:45:35.700 57451 tls_close(): shutting down TLS (with response-wait)
07:45:35.700 57451 tls_write((nil), 0)


waiting for the TLS close...

07:46:33.400 57448 polling subprocess pipes


(note the PID difference) This is the parent of the transport process,
waiting for status from there

07:47:33.420 57448 polling subprocess pipes
07:48:33.450 57448 polling subprocess pipes
07:49:33.520 57448 polling subprocess pipes
07:50:33.530 57448 polling subprocess pipes
07:51:33.550 57448 polling subprocess pipes
07:52:33.650 57448 polling subprocess pipes
07:53:33.730 57448 polling subprocess pipes
07:54:33.750 57448 polling subprocess pipes
07:55:33.830 57448 polling subprocess pipes
07:56:33.930 57448 polling subprocess pipes
07:57:33.950 57448 polling subprocess pipes


mmm, 10+ minutes elapsed - and 57448 has neither timed out nor
noticed that 57451 died.  Or sent status, but then it should
have said *somthing* more.  I wonder if it crashed (as seen in
one previous event) and 57448 didn't see it?

Aha.  there was a recent for affecting FreeBSD which might
be relevant:  https://bugs.exim.org/show_bug.cgi?id=2831

From
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259822
I suspect that The FreeBSD project has not picked up that
fix (and possibly also other more-recent ones).

The workaround for that particular issue was to periodically
restart the daemon.  A SIGHUP suffices.
Preferable would be to incorporate fixes in the Exim you are running.


Of course, that bug might not be the one you're hitting,
but it's a possible.
--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

Re: [exim] stuck exim processes

Reply via email to