Re: SA gone mad, times out and stucks

2006-10-01 Thread Jürgen Herz
Andreas Pettersson wrote:
> Jürgen Herz wrote:
> 
>>What I still get and not understand is
>>warn: bayes: cannot open bayes databases /var/spool/exim4/.spamassa
>>ssin/bayes_* R/W: lock failed: File exists
>>  
>>
> 
> Make sure the file permissions hasn't changed when you ran the manual 
> expire.

It hasn't and as I wrote, I got that error before.

But that's the smaller problem - again. Since tonight I get the timeout
again on each message when auto expiring old tokens. :-(

What I don't get is the following: I'm running SA for four months now,
but that expire timeout I first saw two weeks ago.
The timeout is at 300 secs but expiring manually takes twice as long.
Shouldn't the time to expire grow linear with growing bayes db? And
shouldn't I have seen those timeouts much more early - since time for
expiring was everything > 300 secs?

Also is eleven minutes normal for such a small db? I know that machine
isn't the fastest (300 MHz PPC) but shouldn't it nevertheless be sufficient?

sa-learn --dump magic
0.0000  30  non-token data: bayes db version
0.0000   21810  non-token data: nspam
0.0000   14300  non-token data: nham
0.0000 1760040  non-token data: ntokens
0.0000 11484669090  non-token data: oldest atime
0.0000 11597038770  non-token data: newest atime
0.0000 11597038800  non-token data: last journal sync atime
0.0000 11595778590  non-token data: last expiry atime
0.0000   110592000  non-token data: last expire atime delta
0.0000   55950  non-token data: last expire reduction count

sa-learn --force-expire
bayes: synced databases from journal in 0 seconds: 138 unique entries
(138 total entries)
expired old bayes database entries in 579 seconds
174731 entries kept, 1273 deleted
token frequency: 1-occurrence tokens: 67.92%
token frequency: less than 8 occurrences: 24.63%


And since I still don't know more about them, are those many huge files
like bayes_toks.expire16081 normal?

Thanks,
Jürgen


Re: SA gone mad, times out and stucks

2006-09-30 Thread Jürgen Herz
Jürgen Herz wrote:
> Bowie Bailey wrote:
>> If your --force-expire only took 19 seconds, I would guess that you
>> are not talking to the same database.  Make sure you are logged in as
>> the same user that is having the problem when you run the
>> --force-expire.
> 
> Uh, that's a very good point. You can be right, --force-expire as that
> actual user took 641 secs.
> Have reenabled bayes_auto_expire now and will see.

Manual --force-expire seems to have helped. I only get one timout per
day since then - from what I see if multiple mails come in at the same time.

What I still get and not understand is
warn: bayes: cannot open bayes databases /var/spool/exim4/.spamassa
ssin/bayes_* R/W: lock failed: File exists

Thanks for all your help so far.

Regards,
Jürgen


Re: SA gone mad, times out and stucks

2006-09-26 Thread Jürgen Herz
Bowie Bailey wrote:
> Jürgen Herz wrote:
>> After a forced manual Bayes expire it didn't go better. And since the
>> --force-expire run only took 19 secs it seems unlikely the db was to
>> huge (the whole .spamassassin folder is 52 MB where bayes_toks is 4
>> MB, the 44 bayes_toks.expire* are about 1 MB each).
>> On the other side, after disabling auto expire completely
>> (bayes_auto_expire 0), the timout problems are gone.
>> 
>> So what could go on here? Any other ideas where to look, create
>> detailed logs a.s.o.?
> 
> If your --force-expire only took 19 seconds, I would guess that you
> are not talking to the same database.  Make sure you are logged in as
> the same user that is having the problem when you run the
> --force-expire.

Uh, that's a very good point. You can be right, --force-expire as that
actual user took 641 secs.
Have reenabled bayes_auto_expire now and will see.

Thank you very much,
Jürgen


Re: SA gone mad, times out and stucks

2006-09-26 Thread Jürgen Herz
Loren Wilton wrote:
>> warn: bayes: expire_old_tokens: child processing timeout at
>> /usr/sbin/spamd line 1086.
>> (Spamd then takes very long to scan a mail:
>> info: spamd: clean message (0.0/5.0) for Debian-exim:106 in 305.0
>> seconds, 3781 bytes.)
> 
> The child is trying to run a Bayes expire, apparently on a large Bayes 
> database that hasn't had a successful expiry run in some time.  This attempt 
> to process the Bayes database is probably taking over 300 seconds, and the 
> child is being timed out and killed by something.  As a result of being 
> killed, it never finished the Bayes expire processing.  So the next child 
> tries to do the same thing, gets timed out and killed, the nex child tries 
> to do the same thing...
> 
> Run a manual Bayes expire run and it will probably clean up your problems. 
> If this sort of problem starts to reoccur you might consider turning off 
> bayes auto expire and setting up a cron run to do it once a day or so.  (Or 
> more often, depending on your mail volume.)

After a forced manual Bayes expire it didn't go better. And since the
--force-expire run only took 19 secs it seems unlikely the db was to
huge (the whole .spamassassin folder is 52 MB where bayes_toks is 4 MB,
the 44 bayes_toks.expire* are about 1 MB each).
On the other side, after disabling auto expire completely
(bayes_auto_expire 0), the timout problems are gone.

So what could go on here? Any other ideas where to look, create detailed
logs a.s.o.?

Jürgen


Re: SA gone mad, times out and stucks

2006-09-24 Thread Jürgen Herz
Loren Wilton wrote:

> The child is trying to run a Bayes expire, apparently on a large Bayes 
> database that hasn't had a successful expiry run in some time.  This attempt 
> to process the Bayes database is probably taking over 300 seconds, and the 
> child is being timed out and killed by something.  As a result of being 
> killed, it never finished the Bayes expire processing.  So the next child 
> tries to do the same thing, gets timed out and killed, the nex child tries 
> to do the same thing...

Thank you for answering!
What you wrote sounds reasonably. And following line from sa-learn
--dump magic supports that tokens haven't expired in the past (if I
interpret that output right).
0.000 0 0 0  non-token data: last expiry atime

> Run a manual Bayes expire run and it will probably clean up your problems.

I did run sa-learn --force-expire - but it only took 19 secs. So it's
hard to understand how that could not be done on demand.
At least the output of sa-learn now shows the following line
0.000 0 1159130961  0  non-token data: last expiry atime

/var/spool/exim4/.spamassassin/ among other files contains 40 files that
match the bayes_* contained in the warn I cited. They're like
1064960 2006-09-23 03:00 bayes_toks.expire12516
This is normal I hope.

I'll monitor this the next time and write again if or if not manual
expiring helped.

Bye,
Jürgen


SA gone mad, times out and stucks

2006-09-23 Thread Jürgen Herz
Hello!

I'm running Exim together with spamc/spamd on my box for months now
without problems. But a week ago many spams begun to show up in my
Inbox, so I investigated what's wrong. Until recently most spamd.log
entries looked like this:

info: spamd: got connection over /var/run/spamd.sock
info: spamd: setuid to Debian-exim succeeded
info: spamd: processing message <[EMAIL PROTECTED]> for Debian-exim:106
info: spamd: clean message (-2.4/5.0) for Debian-exim:106 in 5.7
seconds, 4793 bytes.
info: spamd: result: . -2 - SOME_CHECKS
scantime=5.7,size=4793,user=Debian-exim,uid=106,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=/var/run/spamd.sock,mid=<[EMAIL
 PROTECTED]>,bayes=0,autolearn=no


But now soon after restarting Spamassassin, Exim reports "spamd took
more than 60 secs to run" (and thus the connection times out and Exim
doesn't sort out spams anymore). And from this point on spamd.log
doesn't contain any new entries.
Before discontinuation of service, log entries contain warns like

warn: bayes: cannot open bayes databases
/var/spool/exim4/.spamassassin/bayes_* R/W: lock failed: File exists
(Often repeated two or three times for each mail.)

warn: bayes: expire_old_tokens: child processing timeout at
/usr/sbin/spamd line 1086.
(Spamd then takes very long to scan a mail:
info: spamd: clean message (0.0/5.0) for Debian-exim:106 in 305.0
seconds, 3781 bytes.)

These mails are neither big nor is the machine under heavy load. Other
messages (and all formerly) of same (and bigger) size take about five
seconds.

Netstat reports quite a few spamd.pid and spamd.child (some with
spamd.sock) though max-children is 2.

I'm using Spamassassin 3.1.3 from Debian backports for stable (PPC).
It's started with
--create-prefs --max-children 2 --syslog=/var/log/spamd.log
--helper-home-dir --socketpath=/var/run/spamd.sock


Any thoughts what's wrong?

Regards,
Jürgen