Re: bayes sync is hogging cpu

2006-09-29 Thread Andreas Pettersson

Bret Miller wrote:


I used to have problems with bayes locking and journaling. When it
finally corrupted the database, I decided it was time to put it into a
real SQL database instead of using DB_File. Haven't had a single problem
with bayes CPU or locking since.

Maybe it's time you consider using MySQL?

Bret
 



I have now simply put an end to the misery by wiping the DB :)
And the issue is of course solved. I'll be looking into MySQL in the 
very near future, I think.


Thanks to everyone who has answered!

Best Regards,
Andreas



Re: bayes sync is hogging cpu

2006-09-26 Thread Fabien GARZIANO
 
Ok, I may say something dumb, but have you tried to clear the bayes db
with : 
sa-learn --clear --dbpath

-- Fab





Re: bayes sync is hogging cpu

2006-09-26 Thread Andreas Pettersson

Bret Miller wrote:


I used to have problems with bayes locking and journaling. When it
finally corrupted the database, I decided it was time to put 
 


it into a
   

real SQL database instead of using DB_File. Haven't had a 
 


single problem
   


with bayes CPU or locking since.

Maybe it's time you consider using MySQL?

Bret



 


Well, if it solves the problem I'm ready to try almost anything. :)
The way you put your words tells me that the problem IS a 
corrupt database.
Can we be certain? And is there any way fo fix it until I can 
get MySQL up 'n running?
   



If the database is corrupted, it should say so. In my case, it wouldn't
expire, learn, sync, or use the db_file database because it ended up
corrupted somehow. I could have restored it from backup, but chose to
simply delete it and start over with SQL. 


...

Bret
 



Well, I've let sa-learn --force-expire --showdots run for 19 hours now 
(even on a separate machine), 100% cpu util all the time, and not a 
single dot has appeared on the screen.

If I can't get to understand how to use db_recover, wiping is the next step.

Regards,
Andreas



Re: bayes sync is hogging cpu

2006-09-25 Thread Andreas Pettersson
Me again. Since I'm not getting any responses I better keep posting more 
information as I've made some more investigating today.


Sometimes when I run sa-learn --force-expire I get this response almost 
immediately:

Bus error (core dumped)
When I run again the process just hogs until I break it after about 15 
minutes.


I have also changed bayes_learn_to_journal back to 0 and lock_method to 
flock.


Now I get these in spamd.log:
Mon Sep 25 17:05:18 2006 [8853] warn: bayes: cannot open bayes databases 
/usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: 
Interrupted system call


I also lowered --max-children from 8 to 6 with this result:
Mon Sep 25 17:11:03 2006 [6702] info: prefork: server reached 
--max-children setting, consider raising it


Here's some top output of a typical situation:
 PID USERNAME PRI NICE   SIZERES STATETIME   WCPUCPU COMMAND
8287 spamd1320 48056K 44220K RUN  8:00 88.43% 88.43% perl5.8.7
8853 spamd 200 40416K 38356K lockf0:11  1.32%  1.32% perl5.8.7
9128 spamd 200 38592K 36544K lockf0:03  0.63%  0.63% perl5.8.7
8879 spamd 200 40804K 38484K lockf0:08  0.59%  0.59% perl5.8.7
9103 spamd 200 39728K 37736K lockf0:04  0.54%  0.54% perl5.8.7

-rw---  1 spamd  wheel45 Sep 25 17:04 bayes.mutex
-rw---  1 spamd  wheel240024 Sep 25 17:15 bayes_journal
-rw---  1 spamd  wheel   1039920 Sep 25 17:04 bayes_journal.old
-rw-r--r--  1 spamd  wheel  83787776 Sep 25 16:09 bayes_seen
-rw---  1 spamd  wheel  85901312 Sep 25 17:04 bayes_toks

# cat bayes.mutex
8287
6708
6708
6708
6708
6708
6708
6708
6708


What is wrong?! What is making spamd go *kaboom* several times an hour?
Is it something with expiring tokens that's not working correctly?
Is it normal to have an bayes_journal.old laying around?
What more can I do to find the cause?

If the core dump (22 MB) is of any interrest, I'll upload it somewhere.



Best regards,
Andreas





Andreas Pettersson wrote:


Ok, more information here.

I found in spamd.log this line when the problem started:
Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: child 
processing timeout at /usr/local/bin/spamd line 1082


which was followed by lots of these:
Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes 
databases /usr/local/share/spamassassin/bayes/bayes_* R/W:

lock failed: File exists

In an attempt to find what's wrong I changed bayes_learn_to_journal to 
1. It didn't help, but at least I got rid of the 'lock failed: File 
exist' error messages in spamd.log and bayes also keeps working. For 
the moment I have a script that checks for bayes.lock existance and 
kills the hogging process and removes the lock file. It runs every 
minute..



I have tried change lock_method to flock, problem still there (but 
with a new lock file name).
I also tried a sa-learn --force-expire. It took about 30 sec to 
complete. It didn't solve my problem either.



Any ideas of what might be wrong?

Regards,
Andreas






Re: bayes sync is hogging cpu

2006-09-25 Thread Andreas Pettersson

Here's an interesting observation.
I set bayes_auto_expire to 0 as a temporary solution, I thought, and 
restarted spamd. The hogging occurs at least as often as before. Am I 
looking in the wrong direction or wouldn't this have helped something?


Another observation:
# sa-learn --dump magic:
bayes: cannot open bayes databases 
/usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: 
Interrupted system call

0.000  0  3  0  non-token data: bayes db version
0.000  0 437041  0  non-token data: nspam
0.000  0 253396  0  non-token data: nham
0.000  04616765  0  non-token data: ntokens
0.000  0 1156977303  0  non-token data: oldest atime
0.000  0 1159200779  0  non-token data: newest atime
0.000  0 1159199860  0  non-token data: last journal 
sync atime

0.000  0 1158904222  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


last expiry atime converts to september 22, the same day my problems 
started. But if the hogging continues even with bayes_auto_expire set to 
0, then where should I be looking instead?


Regards,
Andreas



Andreas Pettersson wrote:

Me again. Since I'm not getting any responses I better keep posting 
more information as I've made some more investigating today.


Sometimes when I run sa-learn --force-expire I get this response 
almost immediately:

Bus error (core dumped)
When I run again the process just hogs until I break it after about 15 
minutes.


I have also changed bayes_learn_to_journal back to 0 and lock_method 
to flock.


Now I get these in spamd.log:
Mon Sep 25 17:05:18 2006 [8853] warn: bayes: cannot open bayes 
databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock 
failed: Interrupted system call


I also lowered --max-children from 8 to 6 with this result:
Mon Sep 25 17:11:03 2006 [6702] info: prefork: server reached 
--max-children setting, consider raising it


Here's some top output of a typical situation:
 PID USERNAME PRI NICE   SIZERES STATETIME   WCPUCPU COMMAND
8287 spamd1320 48056K 44220K RUN  8:00 88.43% 88.43% 
perl5.8.7
8853 spamd 200 40416K 38356K lockf0:11  1.32%  1.32% 
perl5.8.7
9128 spamd 200 38592K 36544K lockf0:03  0.63%  0.63% 
perl5.8.7
8879 spamd 200 40804K 38484K lockf0:08  0.59%  0.59% 
perl5.8.7
9103 spamd 200 39728K 37736K lockf0:04  0.54%  0.54% 
perl5.8.7


-rw---  1 spamd  wheel45 Sep 25 17:04 bayes.mutex
-rw---  1 spamd  wheel240024 Sep 25 17:15 bayes_journal
-rw---  1 spamd  wheel   1039920 Sep 25 17:04 bayes_journal.old
-rw-r--r--  1 spamd  wheel  83787776 Sep 25 16:09 bayes_seen
-rw---  1 spamd  wheel  85901312 Sep 25 17:04 bayes_toks

# cat bayes.mutex
8287
6708
6708
6708
6708
6708
6708
6708
6708


What is wrong?! What is making spamd go *kaboom* several times an hour?
Is it something with expiring tokens that's not working correctly?
Is it normal to have an bayes_journal.old laying around?
What more can I do to find the cause?

If the core dump (22 MB) is of any interrest, I'll upload it somewhere.



Best regards,
Andreas





Andreas Pettersson wrote:


Ok, more information here.

I found in spamd.log this line when the problem started:
Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: 
child processing timeout at /usr/local/bin/spamd line 1082


which was followed by lots of these:
Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes 
databases /usr/local/share/spamassassin/bayes/bayes_* R/W:

lock failed: File exists

In an attempt to find what's wrong I changed bayes_learn_to_journal 
to 1. It didn't help, but at least I got rid of the 'lock failed: 
File exist' error messages in spamd.log and bayes also keeps working. 
For the moment I have a script that checks for bayes.lock existance 
and kills the hogging process and removes the lock file. It runs 
every minute..



I have tried change lock_method to flock, problem still there (but 
with a new lock file name).
I also tried a sa-learn --force-expire. It took about 30 sec to 
complete. It didn't solve my problem either.



Any ideas of what might be wrong?

Regards,
Andreas









RE: bayes sync is hogging cpu

2006-09-25 Thread Bret Miller
 Me again. Since I'm not getting any responses I better keep
 posting more
 information as I've made some more investigating today.

 Sometimes when I run sa-learn --force-expire I get this
 response almost immediately:
 Bus error (core dumped)
 When I run again the process just hogs until I break it after
 about 15 minutes.


I used to have problems with bayes locking and journaling. When it
finally corrupted the database, I decided it was time to put it into a
real SQL database instead of using DB_File. Haven't had a single problem
with bayes CPU or locking since.

Maybe it's time you consider using MySQL?

Bret





Re: bayes sync is hogging cpu

2006-09-25 Thread Andreas Pettersson

Bret Miller wrote:


I used to have problems with bayes locking and journaling. When it
finally corrupted the database, I decided it was time to put it into a
real SQL database instead of using DB_File. Haven't had a single problem
with bayes CPU or locking since.

Maybe it's time you consider using MySQL?

Bret

 



Well, if it solves the problem I'm ready to try almost anything. :)
The way you put your words tells me that the problem IS a corrupt database.
Can we be certain? And is there any way fo fix it until I can get MySQL 
up 'n running?


Best regards,
Andreas



Re: bayes sync is hogging cpu

2006-09-25 Thread Jonas Eckerman

Andreas Pettersson wrote:


Bus error (core dumped)


This *can* be the symnptom of a hardware problem, such as bad memory or a bad 
disk.

If you have a disk thats going bad, the symptoms often are corrupt files and 
extremeley slow writes (because the disk controller retries the write operation 
(marking sections as bad) until it either succeeds or gives up).

/Jonas

--
Jonas Eckerman, FSDB  Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: bayes sync is hogging cpu

2006-09-25 Thread Andreas Pettersson

Jonas Eckerman wrote:


Andreas Pettersson wrote:


Bus error (core dumped)



This *can* be the symnptom of a hardware problem, such as bad memory 
or a bad disk.


If you have a disk thats going bad, the symptoms often are corrupt 
files and extremeley slow writes (because the disk controller retries 
the write operation (marking sections as bad) until it either succeeds 
or gives up).


/Jonas


The 'hardware' is VMware ESX 2.5
I think bad hard-hardware would show up in ESX rather than the guest OS..?
But I'm not throwing any ideas away. Let me move the bayes files to 
another area on the disk and have a try.


*momento*

Same Bus error (core dumped) as before when running manual expire.
When I make another try it hogs, and is still doing so after 5 minutes. 
But this time I'll wait at least 30 minutes, just to make sure.
And just to make it clear; the spamd daemon is not running while I do 
manual expire.



Regards,
Andreas



Re: bayes sync is hogging cpu

2006-09-25 Thread David B Funk
On Mon, 25 Sep 2006, Andreas Pettersson wrote:

 Bret Miller wrote:

 I used to have problems with bayes locking and journaling. When it
 finally corrupted the database, I decided it was time to put it into a
 real SQL database instead of using DB_File. Haven't had a single problem
 with bayes CPU or locking since.
[snip..]
 Well, if it solves the problem I'm ready to try almost anything. :)
 The way you put your words tells me that the problem IS a corrupt database.
 Can we be certain? And is there any way fo fix it until I can get MySQL
 up 'n running?

the Perl DB_File module is based upon the Berkeley DB libs. There have
been issues with some versions of Berkeley DB libs, sometimes with
locking.

Check to see if there are updates to the Berkeley DB libs for
your distro, if so install them, check for updates to DB_File module
then reinstall DB_File module.


-- 
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


RE: bayes sync is hogging cpu

2006-09-25 Thread Bret Miller
 I used to have problems with bayes locking and journaling. When it
 finally corrupted the database, I decided it was time to put
 it into a
 real SQL database instead of using DB_File. Haven't had a
 single problem
 with bayes CPU or locking since.
 
 Maybe it's time you consider using MySQL?
 
 Bret
 
 
 

 Well, if it solves the problem I'm ready to try almost anything. :)
 The way you put your words tells me that the problem IS a
 corrupt database.
 Can we be certain? And is there any way fo fix it until I can
 get MySQL up 'n running?

If the database is corrupted, it should say so. In my case, it wouldn't
expire, learn, sync, or use the db_file database because it ended up
corrupted somehow. I could have restored it from backup, but chose to
simply delete it and start over with SQL.

I don't know for sure that this will solve your problem. Bayes still has
to tokenize the message, so there is a certain amount of CPU-intensive
operations that must happen. Overall, it just seems a lot more stable
using a SQL database. I'm using MSSQL here because I have it and it
works. Haven't had a single bayes-related problem since switching to
SQL. Used to have them very often, sometimes daily.

Are you sure you have enough RAM to handle the number of threads you are
running?

Bret





Re: bayes sync is hogging cpu

2006-09-25 Thread Andreas Pettersson

Bret Miller wrote:


Are you sure you have enough RAM to handle the number of threads you are
running? 
 


Yes, I'm pretty sure 512MB is enough.
No swapping going on, and I only scan msgs smaller than 500 KB.
Avg scan time is about 3-4 sec and I scan less than 1 a day.

Regards,
Andreas



Re: bayes sync is hogging cpu

2006-09-25 Thread Logan Shaw

On Mon, 25 Sep 2006, Andreas Pettersson wrote:

Same Bus error (core dumped) as before when running manual expire.
When I make another try it hogs, and is still doing so after 5 minutes. But 
this time I'll wait at least 30 minutes, just to make sure.
And just to make it clear; the spamd daemon is not running while I do manual 
expire.


One thing you could try is running db4_recover (or db_recover,
depending on how it's installed) on the Bayes database.
You would want to make sure nothing at all is using the database
while you did that.  You might also check for a leftover,
bogus lock file.  I don't know off the top of my head whether
Berkeley DB does lock files (or some other lock mechanism),
but if it does and the system had crashed at some point, it
might've left behind a bogus lock file that needs to be removed
manually.

  - Logan


Re: bayes sync is hogging cpu

2006-09-25 Thread Andreas Pettersson

Logan Shaw wrote:


One thing you could try is running db4_recover (or db_recover,
depending on how it's installed) on the Bayes database.



Seems like something to try. But I don't understand the utility:
usage: db_recover [-ceVv] [-h home] [-P password] [-t [[CC]YY]MMDDhhmm[.SS]]
How can I specify my bayes dbs with -h? Just feeding with the path to 
the files gives nothing.

I'm running FreeBSD 5.4.

Regards,
Andreas



Re: bayes sync is hogging cpu

2006-09-24 Thread Andreas Pettersson

Ok, more information here.

I found in spamd.log this line when the problem started:
Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: child 
processing timeout at /usr/local/bin/spamd line 1082


which was followed by lots of these:
Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes 
databases /usr/local/share/spamassassin/bayes/bayes_* R/W:

lock failed: File exists

In an attempt to find what's wrong I changed bayes_learn_to_journal to 
1. It didn't help, but at least I got rid of the 'lock failed: File 
exist' error messages in spamd.log and bayes also keeps working. For the 
moment I have a script that checks for bayes.lock existance and kills 
the hogging process and removes the lock file. It runs every minute..



I have tried change lock_method to flock, problem still there (but with 
a new lock file name).
I also tried a sa-learn --force-expire. It took about 30 sec to 
complete. It didn't solve my problem either.



Any ideas of what might be wrong?

Regards,
Andreas



Re: bayes sync is hogging cpu (was: Some mail seems to hog spamd process)

2006-09-23 Thread Andreas Pettersson

Hi, me again ;)

I'm pretty confident that the hogging occurs when SA is trying to sync 
the bayes. The bayes_journal is cleared exactly when the hogging begins. 
And when I run sa-learn --sync I get the very same hogging effect.


The permissions seems ok, doesn't it?

-rw---  1 spamd  wheel20 Sep 23 13:28 bayes.lock
-rw---  1 spamd  wheel  2760 Sep 23 13:28 bayes_journal
-rw-r--r--  1 spamd  wheel  83755008 Sep 23 13:28 bayes_seen
-rw---  1 spamd  wheel  83853312 Sep 23 13:28 bayes_toks


Regards,
Andreas