Re: [Mailman-Users] Stuck OutgoingRunner
On 02/04/18 12:13, Mark Sapiro wrote: The status of 'S' for OutgoingRunner is "uninterruptable sleep". This means it's either called time.sleep for QRUNNER_SLEEP_TIME (default = 1 second) which is unlikely as it should wake up, or it's waiting for response from something, most likely a response from the MTA. As far as I read the code, if OutgoingRunner catch SIGINT during waiting for response from the MTA, the signal handler for SIGINT in qrunner set flag to exit from loop, then socket module raise socket.error for EINTR, but SMTP module retry to read from socket and waiting for response until receiving response or connection closing (from MTA side or by error). Thus it cannot reach to the code to exit if the connection is kept alive and MTA send no data. -- Yasuhito FUTATSUKI-- Mailman-Users mailing list Mailman-Users@python.org https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Stuck OutgoingRunner
On 02/03/2018 01:03 AM, Sebastian Hagedorn wrote: >> >> Did you look at the out queue, and if so was there a .bak file there. >> This would be the entry currently being processed. > > I looked at the out queue, and there was no .bak file. Interesting. That says that OutgoingRunner is not currently delivering a message, but that is inconsistent with this: >> Also, the TCP connection to the MTA being ESTABLISHED says the >> OutgoingRunner has called SMTPDirect.process() and it in turn is >> somewhere in its delivery loop of sending SMTP transactions. >> >> Are there any clues in the MTA logs? > > I just found this in Mailman's smtp-failures log: > > Feb 01 14:28:49 2018 (1674) Low level smtp error: [Errno 111] Connection > refused, msgid: >> Feb 01 14:28:49 2018 (1674) delivery to x...@uni-koeln.de failed with > code -1: [Errno 111] Connection refused > > I can't prove it, but this time stamp seems to coincide with the moment > the OutgoingRunner got stuck, based on the age of the queue files. The > receiving SMTP server was under heavy load at that moment, so it is > possible that it might have refused the connection. Normally, that won't cause a problem like this. This occurs at a fairly low level in SMTPDirect.py when Mailman is initiating a transaction with the MTA to send to one or more recipients. The recipients will be marked as "refused retryably" and OutgoingRunner will queue the message for those recipients. in the retry queue to be retried You can set SMTPLIB_DEBUG_LEVEL = 1 in mm_cfg.py to log copious smtplib debugging info to Mailman's error log. Then the log will show the last thing that was done before the hang. > If this should happen again, what should we look for? Would a gdb > backtrace be helpful? It might be if you can find just where in the code it's hung. Also, I didn't look carefully before, but in your OP, you show > mailman 1663 0.0 0.0 233860 2204 ?Ss Jan16 0:00 > /usr/bin/python2.7 /usr/lib/mailman/bin/mailmanctl -s -q start > mailman 1677 0.1 0.9 295064 73284 ?SJan16 35:35 > /usr/bin/python2.7 /usr/lib/mailman/bin/qrunner --runner=OutgoingRunner:3:4 > -s The status of 'S' for OutgoingRunner is "uninterruptable sleep". This means it's either called time.sleep for QRUNNER_SLEEP_TIME (default = 1 second) which is unlikely as it should wake up, or it's waiting for response from something, most likely a response from the MTA. -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Stuck OutgoingRunner
Thanks for your reply! On 02/02/2018 02:26 AM, Sebastian Hagedorn wrote: [root@mailman3/usr/lib/mailman/bin]$ lsof -p 1677 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python2.7 1677 mailman cwd DIR 253,0 4096 173998 /usr/lib/mailman python2.7 1677 mailman rtd DIR 253,0 4096 2 / ... python2.7 1677 mailman 10u IPv6 46441320 0t0 TCP mailman3.rrz.uni-koeln.de:55764->smtp-out.rrz.uni-koeln.de:smtp (ESTABLISHED) In both instances the OutgoingRunner was stuck on an SMTP connection. I had to use "kill -9" to get rid of it. Any ideas what might be causing that? I think I've seen this once or maybe twice, I don't recall details. I wasn't able to determine a cause. I haven't seen it in years. Did you look at the out queue, and if so was there a .bak file there. This would be the entry currently being processed. I looked at the out queue, and there was no .bak file. Also, the TCP connection to the MTA being ESTABLISHED says the OutgoingRunner has called SMTPDirect.process() and it in turn is somewhere in its delivery loop of sending SMTP transactions. Are there any clues in the MTA logs? I just found this in Mailman's smtp-failures log: Feb 01 14:28:49 2018 (1674) Low level smtp error: [Errno 111] Connection refused, msgid:Feb 01 14:28:49 2018 (1674) delivery to x...@uni-koeln.de failed with code -1: [Errno 111] Connection refused I can't prove it, but this time stamp seems to coincide with the moment the OutgoingRunner got stuck, based on the age of the queue files. The receiving SMTP server was under heavy load at that moment, so it is possible that it might have refused the connection. The message was delivered successfully after I killed the stuck runner and restarted the service. I wasn't able to find anything pertinent on the receiving server. If this should happen again, what should we look for? Would a gdb backtrace be helpful? -- Sebastian Hagedorn - Weyertal 121, Zimmer 2.02 Regionales Rechenzentrum (RRZK) Universität zu Köln / Cologne University - Tel. +49-221-470-89578 -- Mailman-Users mailing list Mailman-Users@python.org https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org