Re: SA gone mad, times out and stucks
Andreas Pettersson wrote: > Jürgen Herz wrote: > >>What I still get and not understand is >>warn: bayes: cannot open bayes databases /var/spool/exim4/.spamassa >>ssin/bayes_* R/W: lock failed: File exists >> >> > > Make sure the file permissions hasn't changed when you ran the manual > expire. It hasn't and as I wrote, I got that error before. But that's the smaller problem - again. Since tonight I get the timeout again on each message when auto expiring old tokens. :-( What I don't get is the following: I'm running SA for four months now, but that expire timeout I first saw two weeks ago. The timeout is at 300 secs but expiring manually takes twice as long. Shouldn't the time to expire grow linear with growing bayes db? And shouldn't I have seen those timeouts much more early - since time for expiring was everything > 300 secs? Also is eleven minutes normal for such a small db? I know that machine isn't the fastest (300 MHz PPC) but shouldn't it nevertheless be sufficient? sa-learn --dump magic 0.0000 30 non-token data: bayes db version 0.0000 21810 non-token data: nspam 0.0000 14300 non-token data: nham 0.0000 1760040 non-token data: ntokens 0.0000 11484669090 non-token data: oldest atime 0.0000 11597038770 non-token data: newest atime 0.0000 11597038800 non-token data: last journal sync atime 0.0000 11595778590 non-token data: last expiry atime 0.0000 110592000 non-token data: last expire atime delta 0.0000 55950 non-token data: last expire reduction count sa-learn --force-expire bayes: synced databases from journal in 0 seconds: 138 unique entries (138 total entries) expired old bayes database entries in 579 seconds 174731 entries kept, 1273 deleted token frequency: 1-occurrence tokens: 67.92% token frequency: less than 8 occurrences: 24.63% And since I still don't know more about them, are those many huge files like bayes_toks.expire16081 normal? Thanks, Jürgen
Re: SA gone mad, times out and stucks
Jürgen Herz wrote: > Bowie Bailey wrote: >> If your --force-expire only took 19 seconds, I would guess that you >> are not talking to the same database. Make sure you are logged in as >> the same user that is having the problem when you run the >> --force-expire. > > Uh, that's a very good point. You can be right, --force-expire as that > actual user took 641 secs. > Have reenabled bayes_auto_expire now and will see. Manual --force-expire seems to have helped. I only get one timout per day since then - from what I see if multiple mails come in at the same time. What I still get and not understand is warn: bayes: cannot open bayes databases /var/spool/exim4/.spamassa ssin/bayes_* R/W: lock failed: File exists Thanks for all your help so far. Regards, Jürgen
Re: SA gone mad, times out and stucks
Bowie Bailey wrote: > Jürgen Herz wrote: >> After a forced manual Bayes expire it didn't go better. And since the >> --force-expire run only took 19 secs it seems unlikely the db was to >> huge (the whole .spamassassin folder is 52 MB where bayes_toks is 4 >> MB, the 44 bayes_toks.expire* are about 1 MB each). >> On the other side, after disabling auto expire completely >> (bayes_auto_expire 0), the timout problems are gone. >> >> So what could go on here? Any other ideas where to look, create >> detailed logs a.s.o.? > > If your --force-expire only took 19 seconds, I would guess that you > are not talking to the same database. Make sure you are logged in as > the same user that is having the problem when you run the > --force-expire. Uh, that's a very good point. You can be right, --force-expire as that actual user took 641 secs. Have reenabled bayes_auto_expire now and will see. Thank you very much, Jürgen
Re: SA gone mad, times out and stucks
Loren Wilton wrote: >> warn: bayes: expire_old_tokens: child processing timeout at >> /usr/sbin/spamd line 1086. >> (Spamd then takes very long to scan a mail: >> info: spamd: clean message (0.0/5.0) for Debian-exim:106 in 305.0 >> seconds, 3781 bytes.) > > The child is trying to run a Bayes expire, apparently on a large Bayes > database that hasn't had a successful expiry run in some time. This attempt > to process the Bayes database is probably taking over 300 seconds, and the > child is being timed out and killed by something. As a result of being > killed, it never finished the Bayes expire processing. So the next child > tries to do the same thing, gets timed out and killed, the nex child tries > to do the same thing... > > Run a manual Bayes expire run and it will probably clean up your problems. > If this sort of problem starts to reoccur you might consider turning off > bayes auto expire and setting up a cron run to do it once a day or so. (Or > more often, depending on your mail volume.) After a forced manual Bayes expire it didn't go better. And since the --force-expire run only took 19 secs it seems unlikely the db was to huge (the whole .spamassassin folder is 52 MB where bayes_toks is 4 MB, the 44 bayes_toks.expire* are about 1 MB each). On the other side, after disabling auto expire completely (bayes_auto_expire 0), the timout problems are gone. So what could go on here? Any other ideas where to look, create detailed logs a.s.o.? Jürgen
Re: SA gone mad, times out and stucks
Loren Wilton wrote: > The child is trying to run a Bayes expire, apparently on a large Bayes > database that hasn't had a successful expiry run in some time. This attempt > to process the Bayes database is probably taking over 300 seconds, and the > child is being timed out and killed by something. As a result of being > killed, it never finished the Bayes expire processing. So the next child > tries to do the same thing, gets timed out and killed, the nex child tries > to do the same thing... Thank you for answering! What you wrote sounds reasonably. And following line from sa-learn --dump magic supports that tokens haven't expired in the past (if I interpret that output right). 0.000 0 0 0 non-token data: last expiry atime > Run a manual Bayes expire run and it will probably clean up your problems. I did run sa-learn --force-expire - but it only took 19 secs. So it's hard to understand how that could not be done on demand. At least the output of sa-learn now shows the following line 0.000 0 1159130961 0 non-token data: last expiry atime /var/spool/exim4/.spamassassin/ among other files contains 40 files that match the bayes_* contained in the warn I cited. They're like 1064960 2006-09-23 03:00 bayes_toks.expire12516 This is normal I hope. I'll monitor this the next time and write again if or if not manual expiring helped. Bye, Jürgen
SA gone mad, times out and stucks
Hello! I'm running Exim together with spamc/spamd on my box for months now without problems. But a week ago many spams begun to show up in my Inbox, so I investigated what's wrong. Until recently most spamd.log entries looked like this: info: spamd: got connection over /var/run/spamd.sock info: spamd: setuid to Debian-exim succeeded info: spamd: processing message <[EMAIL PROTECTED]> for Debian-exim:106 info: spamd: clean message (-2.4/5.0) for Debian-exim:106 in 5.7 seconds, 4793 bytes. info: spamd: result: . -2 - SOME_CHECKS scantime=5.7,size=4793,user=Debian-exim,uid=106,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=/var/run/spamd.sock,mid=<[EMAIL PROTECTED]>,bayes=0,autolearn=no But now soon after restarting Spamassassin, Exim reports "spamd took more than 60 secs to run" (and thus the connection times out and Exim doesn't sort out spams anymore). And from this point on spamd.log doesn't contain any new entries. Before discontinuation of service, log entries contain warns like warn: bayes: cannot open bayes databases /var/spool/exim4/.spamassassin/bayes_* R/W: lock failed: File exists (Often repeated two or three times for each mail.) warn: bayes: expire_old_tokens: child processing timeout at /usr/sbin/spamd line 1086. (Spamd then takes very long to scan a mail: info: spamd: clean message (0.0/5.0) for Debian-exim:106 in 305.0 seconds, 3781 bytes.) These mails are neither big nor is the machine under heavy load. Other messages (and all formerly) of same (and bigger) size take about five seconds. Netstat reports quite a few spamd.pid and spamd.child (some with spamd.sock) though max-children is 2. I'm using Spamassassin 3.1.3 from Debian backports for stable (PPC). It's started with --create-prefs --max-children 2 --syslog=/var/log/spamd.log --helper-home-dir --socketpath=/var/run/spamd.sock Any thoughts what's wrong? Regards, Jürgen