Package: spamassassin Version: 3.3.1-1 Severity: normal Dear SpamAssassin Maintainers,
on a machine running Spamd for a mail server with approx. 1500 active users the spamd master process seems to become unresponsive once every few weeks. Restarting spamassassin seems to suffice to fix this problem. "kill -9" was not necessary. Nevertheless it causes all mail delivery to stall due to spamc no more being able to get a result from spamd. (spamc and spamd run on differnt machines, both being Xen DomUs.) An additional symptom of this issue seems the presence of spamd zombie children looking like this: <username> 6465 0.0 0.0 0 0 ? ZN 20:16 0:00 [spamd] Those zombie spamd children seem to start approx. half a day before spamd actually stops to work completely, at least in today's case: That zombie process above was started at 20:16. Our monitoring system noticed it as zombie from 20:22 to 08;14 on the following day. (Checking happens every 5 minutes.) The amount of procmail processes (which call spamc) running in parallel reached the warning level of 20 at 07:33, and critical level of 100 at 08:13. The last sucessfully delivered (test) mail reached its destination at 07:25. (Happens every 5 minutes, too.) Around 08:15 we restarted spamassassin and everything was fine again. Looking at the spamd logs revealed that spamd seem to have worked fine until it got killhupped at 07:27: # fgrep spamd mail.log […] Mar 13 07:27:07 … spamd[27526]: spamd: connection from … […] at port 39595 Mar 13 07:27:07 … spamd[27526]: spamd: setuid to … succeeded Mar 13 07:27:07 … spamd[27526]: spamd: processing message <…> for …:… Mar 13 07:27:10 … spamd[27526]: spamd: clean message (-1.2/5.0) for …:… in 3.3 seconds, 66598 bytes. Mar 13 07:27:10 … spamd[27526]: spamd: result: . -1 - BAYES_00,HTML_MESSAGE,RCVD_IN_DNSWL_MED scantime=3.3,size=66598,user=…,uid=…,required_score=5.0,rhost=…,raddr=…,rport=39595,mid=<…>,bayes=0.000000,autolearn=ham Mar 13 07:27:11 … spamd[6165]: prefork: child states: KII Mar 13 07:27:14 … spamd[6165]: spamd: server hit by SIGHUP, restarting Mar 13 07:27:14 … spamd[6165]: spamd: child [31179] killed successfully: interrupted, signal 2 (0002) Mar 13 07:27:14 … spamd[6165]: spamd: child [27526] killed successfully: interrupted, signal 2 (0002) Mar 13 07:28:17 … spamd[6165]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Mar 13 07:28:17 … spamd[6165]: prefork: select returned error on server filehandle: Mar 13 08:14:12 … spamd[6165]: spamd: server killed by SIGTERM, shutting down Mar 13 08:14:12 … spamd[1972]: logger: removing stderr method Mar 13 08:14:20 … spamd[1975]: spamd: server started on port 783/tcp (running version 3.3.1) Mar 13 08:14:20 … spamd[1975]: spamd: server pid: 1975 Mar 13 08:14:20 … spamd[1975]: spamd: server successfully spawned child process, pid 1981 Mar 13 08:14:20 … spamd[1975]: spamd: server successfully spawned child process, pid 1982 Mar 13 08:14:20 … spamd[1975]: prefork: child states: BI Mar 13 08:14:20 … spamd[1981]: spamd: connection from … […] at port 46441 Mar 13 08:14:20 … spamd[1982]: spamd: connection from … […] at port 46442 […] cron.daily runs at 06:25 localtime (syslog was rotated around 06:27), so it sounds a little bit strange that /etc/cron.daily/spamassassin only finished at 07:27, but I currently have no other explanation for the above seen "kill -HUP" than the "invoke-rc.d spamassassin reload" from /etc/cron.daily/spamassassin. I suspect that the following two lines, the last ones before the SA restart, are related to the non-responsiveness of spamd: Mar 13 07:28:17 … spamd[6165]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Mar 13 07:28:17 … spamd[6165]: prefork: select returned error on server filehandle: (And there's indeed no further output from spamd in the log behind the latter line.) Possibly related to http://bugs.debian.org/611040 or http://bugs.debian.org/346237 (which both have no information about symptoms, just more or less the error messages) I found this error message also at the following days in the log: # zfgrep -2h 'prefork: select returned error on server filehandle:' mail.log* | fgrep spamd Dec 22 08:25:06 … spamd[22411]: prefork: sysread(7) failed after 300 secs at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 654. Dec 22 08:25:06 … spamd[22400]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Dec 22 08:25:06 … spamd[22400]: prefork: select returned error on server filehandle: Dec 22 08:25:06 … spamd[22400]: prefork: child states: II Dec 22 08:25:06 … spamd[22400]: prefork: child states: II Jan 9 11:41:16 … spamd[25239]: prefork: sysread(7) failed after 300 secs at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 654. Jan 9 11:41:16 … spamd[6091]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Jan 9 11:41:16 … spamd[6091]: prefork: select returned error on server filehandle: Jan 9 11:41:16 … spamd[6091]: prefork: child states: II Jan 9 11:41:16 … spamd[6091]: prefork: child states: II Jan 19 14:25:31 … spamd[14526]: prefork: sysread(8) failed after 300 secs at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 654. Jan 19 14:25:31 … spamd[31699]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Jan 19 14:25:31 … spamd[31699]: prefork: select returned error on server filehandle: Jan 19 14:25:31 … spamd[31699]: prefork: child states: II Jan 19 14:25:31 … spamd[31699]: prefork: child states: II Jan 27 06:38:09 … spamd[19644]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Jan 27 06:38:09 … spamd[19644]: prefork: select returned error on server filehandle: Jan 27 06:38:09 … spamd[26848]: spamd: connection from … […] at port 49023 Jan 27 06:38:09 … spamd[787]: spamd: connection from … […] at port 49083 Feb 15 06:46:03 … spamd[10898]: prefork: sysread(8) failed after 300 secs at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 654. Feb 15 06:46:03 … spamd[28626]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Feb 15 06:46:03 … spamd[28626]: prefork: select returned error on server filehandle: Feb 15 06:46:03 … spamd[28626]: prefork: child states: II Feb 15 06:46:03 … spamd[28626]: prefork: child states: II Mar 2 01:37:27 … spamd[378]: prefork: sysread(10) failed after 300 secs at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 654. Mar 2 01:37:27 … spamd[637]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Mar 2 01:37:27 … spamd[637]: prefork: select returned error on server filehandle: Mar 2 01:37:27 … spamd[377]: prefork: sysread(9) failed after 300 secs at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 654. Mar 2 01:37:28 … spamd[637]: prefork: child states: II Mar 13 07:28:17 … spamd[6165]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 332. Mar 13 07:28:17 … spamd[6165]: prefork: select returned error on server filehandle: While it seems that some of these spamd processes continued to work, the dates seem to relate to cases where no mail went through for more than 45 minutes (and then the monitoring alarm went off): Sat Mar 02 00:50:09 2013 Fri Feb 15 06:00:09 2013 (We don't have a monitoring history for those kind of issues before 29th of January as we only started them after the analysis of the spamd non-responsiveness incident on 27th of January.) Nevertheless these alarms predate the according error messages by 45 minutes instead of the other way round as I would have expected, so I'm not really sure what's cause and what's effect. So far for the information about the beginning of spamd no more properly responding to requests. Now about the zombie processes that seem to be related as at least in the cases on 29th of January and today zombie spamd children were around while spamd wasn't working properly anymore and needed manual intervention: These are the syslog entries of the process which became a zombie later: # fgrep '… spamd[6465]:' mail.log Mar 12 20:16:40 … spamd[6465]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454. Mar 12 20:16:40 … spamd[6465]: check: (run_generic) exceeded time limit, skipping further tests Mar 12 20:16:40 … spamd[6465]: spamd: clean message (-3.5/5.0) for …:… in 302.5 seconds, 66181 bytes. Mar 12 20:16:40 … spamd[6465]: spamd: result: . -3 - BAYES_00,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_HELO_PASS,TIME_LIMIT_EXCEEDED scantime=302.5,size=66181,user=…,uid=…,required_score=5.0,rhost=…,raddr=…,rport=53506,mid=<…>,bayes=0.000000,autolearn=unavailable Mar 12 20:16:41 … spamd[6465]: spamd: accidental fork: 6465 != 5813 at /usr/sbin/spamd line 1682. Sounds at least similar to https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4846 All occurrences of "spamd: accidental fork" in the past 3 months: # zfgrep -h … mail.log* | egrep 'spamd.* fork' Jan 26 12:41:04 … spamd[5384]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454. Jan 26 12:41:05 … spamd[5389]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454. Jan 26 12:41:05 … spamd[5384]: spamd: accidental fork: 5384 != 4883 at /usr/sbin/spamd line 1682. Jan 26 12:41:06 … spamd[4883]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454. Jan 26 12:41:06 … spamd[5389]: spamd: accidental fork: 5389 != 4885 at /usr/sbin/spamd line 1682. Jan 26 12:41:07 … spamd[4885]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454. Jan 26 12:41:07 … spamd[5394]: spamd: accidental fork: 5394 != 4887 at /usr/sbin/spamd line 1682. Jan 26 12:41:08 … spamd[5406]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454. Jan 26 12:41:09 … spamd[5406]: spamd: accidental fork: 5406 != 4895 at /usr/sbin/spamd line 1682. Jan 26 12:41:37 … spamd[5451]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283, <CONF> line 34. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454, <CONF> line 34. Jan 26 12:41:38 … spamd[5451]: spamd: accidental fork: 5451 != 4865 at /usr/sbin/spamd line 1682, <CONF> line 34. Jan 26 12:41:39 … spamd[4865]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283, <CONF> line 34. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454, <CONF> line 34. Mar 12 20:16:40 … spamd[6465]: pyzor: check failed: util: cannot fork: child processing timeout at /usr/sbin/spamd line 1283. at /usr/share/perl5/Mail/SpamAssassin/Util.pm line 1454. Mar 12 20:16:41 … spamd[6465]: spamd: accidental fork: 6465 != 5813 at /usr/sbin/spamd line 1682. Mar 12 20:16:41 … spamd[6467]: spamd: accidental fork: 6467 != 5818 at /usr/sbin/spamd line 1682. Mar 12 20:16:43 … spamd[6469]: spamd: accidental fork: 6469 != 5826 at /usr/sbin/spamd line 1682. The dates are only dates on which we also saw zombies and after which we has non-responsiveness issues a few hours later, too,. (I don't think that's a coincidence. :) Feel free to clone and retitle this bug report accordingly in case there are several independent (and not yet reported) issues in here. P.S.: I don't think this is related to http://bugs.debian.org/501964 as we don't suspend and resume our DomUs. -- System Information: Debian Release: 6.0.7 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages spamassassin depends on: pn libarchive-tar-perl <none> (no description available) ii libdigest-sha1-perl 2.13-1 NIST SHA-1 message digest algorith ii libhtml-parser-perl 3.66-1 collection of modules that parse H ii libnet-dns-perl 0.66-2 Perform DNS queries from a Perl sc ii libnetaddr-ip-perl 4.028+dfsg-1 IP address manipulation module ii libsocket6-perl 0.23-1 Perl extensions for IPv6 ii libsys-hostname-long-p 1.4-2 Figure out the long (fully-qualifi ii libwww-perl 5.836-1 Perl HTTP/WWW client/server librar ii perl 5.10.1-17squeeze6 Larry Wall's Practical Extraction ii perl-modules [libio-zl 5.10.1-17squeeze6 Core Perl modules Versions of packages spamassassin recommends: ii gcc 4:4.4.5-1 The GNU C compiler ii gnupg 1.4.10-4+squeeze1 GNU privacy guard - a free PGP rep ii libc6-dev 2.11.3-4 Embedded GNU C Library: Developmen ii libio-socket-inet6-per 2.65-1.1 Object interface for AF_INET6 doma ii libmail-spf-perl 2.007-1 Perl implementation of Sender Poli ii make 3.81-8 An utility for Directing compilati ii perl [libsys-syslog-pe 5.10.1-17squeeze6 Larry Wall's Practical Extraction ii re2c 0.13.5-1 tool for generating fast C-based r ii spamc 3.3.1-1 Client for SpamAssassin spam filte Versions of packages spamassassin suggests: ii libdbi-perl 1.612-1 Perl Database Interface (DBI) ii libio-socket-ssl-perl 1.33-1+squeeze1 Perl module implementing object or ii libmail-dkim-perl 0.38-1 cryptographically identify the sen ii libnet-ident-perl 1.23-1 Perl module to lookup the username ii perl [libcompress-zlib 5.10.1-17squeeze6 Larry Wall's Practical Extraction ii pyzor 1:0.5.0-2 spam-catcher using a collaborative ii razor 1:2.85-3 spam-catcher using a collaborative -- Configuration Files: /etc/default/spamassassin changed: ENABLED=1 OPTIONS="-m50 -i -A … --max-conn-per-child=200" PIDFILE="/var/run/spamd.pid" NICE="--nicelevel 2" CRON=1 /etc/spamassassin/local.cf changed: trusted_networks … internal_networks … report_safe 0 report_contact postmaster@… add_header ham Report _REPORT_ pyzor_timeout 30 use_pyzor 1 pyzor_path /usr/bin/pyzor bayes_sql_override_username default bayes_store_module Mail::SpamAssassin::BayesStore::SQL bayes_sql_dsn DBI:mysql:…:… bayes_sql_username … bayes_sql_password … score RCVD_IN_MAPS_RBL 0 score RCVD_IN_MAPS_DUL 0 score RCVD_IN_MAPS_RSS 0 score RCVD_IN_MAPS_NML 0 score HELO_LOCALHOST 0 score RCVD_NUMERIC_HELO 1.3 score MIME_QP_LONG_LINE 0.7 score FROM_LOCAL_NOVOWEL 2.3 score FROM_NO_LOWER 0 score RCVD_IN_SORBS_DUL 1.0 score RCVD_IN_SORBS_DUL 0.5 score HELO_DYNAMIC_IPADDR2 0.5 score FH_HELO_EQ_D_D_D_D 0.2 score URIBL_GREY 1.0 score URIBL_BLACK 2.0 score URIBL_PH_SURBL 0.2 score URIBL_AB_SURBL 0.2 score URIBL_WS_SURBL 0.2 score URIBL_JP_SURBL 0.2 score URIBL_OB_SURBL 0.2 score URIBL_SC_SURBL 0.2 score URIBL_PH_SURBL 0.2 score RCVD_IN_DNSWL_MED -1.0 score EXTRA_MPART_TYPE 0.8 score MISSING_SUBJECT 0.45 score MIME_BASE64_TEXT 0.9 score MIME_HTML_ONLY 0.75 score RCVD_IN_HEISE_RBL 1.8 score DATE_IN_PAST_03_06 0.1 score REPLYTO_WITHOUT_TO_CC 0.2 score FUZZY_OCR_WRONG_EXTENSION 0.1 score BAYES_00 0 0 -0.20 -0.25 score BAYES_05 0 0 -0.10 -0.20 score BAYES_20 0 0 -0.05 -0.1 score BAYES_40 0 0 -0.01 -0.05 [local rules I do not want to disclose publicly…] [whitelist I do not want to disclose publicly…] [blacklist I do not want to disclose publicly either…] /etc/spamassassin/v310.pre changed: loadplugin Mail::SpamAssassin::Plugin::Pyzor loadplugin Mail::SpamAssassin::Plugin::Razor2 loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold loadplugin Mail::SpamAssassin::Plugin::WhiteListSubject loadplugin Mail::SpamAssassin::Plugin::MIMEHeader loadplugin Mail::SpamAssassin::Plugin::ReplaceTags -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org