Re: bayes sync is hogging cpu
Bret Miller wrote: I used to have problems with bayes locking and journaling. When it finally corrupted the database, I decided it was time to put it into a real SQL database instead of using DB_File. Haven't had a single problem with bayes CPU or locking since. Maybe it's time you consider using MySQL? Bret I have now simply put an end to the misery by wiping the DB :) And the issue is of course solved. I'll be looking into MySQL in the very near future, I think. Thanks to everyone who has answered! Best Regards, Andreas
Re: bayes sync is hogging cpu
When using Berkeley DB, the default values of database config. might lead to database corruption in a high-load situation, especially the cache size. You may try to put the following in the DB_CONFIG :- set_cachesize 0 536870912 1 set_lg_regionmax 262144 set_lg_bsize 2097152 Note : Cache size 512MB For details, please reference the sleepycat documentation :- http://www.sleepycat.com/docs/api_c/env_set_cachesize.html Regards, John Mok Andreas Pettersson wrote: Bret Miller wrote: I used to have problems with bayes locking and journaling. When it finally corrupted the database, I decided it was time to put it into a real SQL database instead of using DB_File. Haven't had a single problem with bayes CPU or locking since. Maybe it's time you consider using MySQL? Bret
Re: bayes sync is hogging cpu
Bret Miller wrote: I used to have problems with bayes locking and journaling. When it finally corrupted the database, I decided it was time to put it into a real SQL database instead of using DB_File. Haven't had a single problem with bayes CPU or locking since. Maybe it's time you consider using MySQL? Bret Well, if it solves the problem I'm ready to try almost anything. :) The way you put your words tells me that the problem IS a corrupt database. Can we be certain? And is there any way fo fix it until I can get MySQL up 'n running? If the database is corrupted, it should say so. In my case, it wouldn't expire, learn, sync, or use the db_file database because it ended up corrupted somehow. I could have restored it from backup, but chose to simply delete it and start over with SQL. ... Bret Well, I've let "sa-learn --force-expire --showdots" run for 19 hours now (even on a separate machine), 100% cpu util all the time, and not a single dot has appeared on the screen. If I can't get to understand how to use db_recover, wiping is the next step. Regards, Andreas
Re: bayes sync is hogging cpu
Fabien GARZIANO wrote: Ok, I may say something dumb, but have you tried to clear the bayes db with : sa-learn --clear --dbpath -- Fab No, not yet, but that would be the last option if nothing else helps. I have already prepared a few 100 spams and hams for immediate training after wipe.. Regards, Andreas
Re: bayes sync is hogging cpu
Ok, I may say something dumb, but have you tried to clear the bayes db with : sa-learn --clear --dbpath -- Fab
Re: bayes sync is hogging cpu
Logan Shaw wrote: One thing you could try is running db4_recover (or db_recover, depending on how it's installed) on the Bayes database. Seems like something to try. But I don't understand the utility: usage: db_recover [-ceVv] [-h home] [-P password] [-t [[CC]YY]MMDDhhmm[.SS]] How can I specify my bayes dbs with -h? Just feeding with the path to the files gives nothing. I'm running FreeBSD 5.4. Regards, Andreas
Re: bayes sync is hogging cpu
On Mon, 25 Sep 2006, Andreas Pettersson wrote: Same Bus error (core dumped) as before when running manual expire. When I make another try it hogs, and is still doing so after 5 minutes. But this time I'll wait at least 30 minutes, just to make sure. And just to make it clear; the spamd daemon is not running while I do manual expire. One thing you could try is running db4_recover (or db_recover, depending on how it's installed) on the Bayes database. You would want to make sure nothing at all is using the database while you did that. You might also check for a leftover, bogus lock file. I don't know off the top of my head whether Berkeley DB does lock files (or some other lock mechanism), but if it does and the system had crashed at some point, it might've left behind a bogus lock file that needs to be removed manually. - Logan
Re: bayes sync is hogging cpu
Bret Miller wrote: Are you sure you have enough RAM to handle the number of threads you are running? Yes, I'm pretty sure 512MB is enough. No swapping going on, and I only scan msgs smaller than 500 KB. Avg scan time is about 3-4 sec and I scan less than 1 a day. Regards, Andreas
RE: bayes sync is hogging cpu
> >I used to have problems with bayes locking and journaling. When it > >finally corrupted the database, I decided it was time to put > it into a > >real SQL database instead of using DB_File. Haven't had a > single problem > >with bayes CPU or locking since. > > > >Maybe it's time you consider using MySQL? > > > >Bret > > > > > > > > Well, if it solves the problem I'm ready to try almost anything. :) > The way you put your words tells me that the problem IS a > corrupt database. > Can we be certain? And is there any way fo fix it until I can > get MySQL up 'n running? If the database is corrupted, it should say so. In my case, it wouldn't expire, learn, sync, or use the db_file database because it ended up corrupted somehow. I could have restored it from backup, but chose to simply delete it and start over with SQL. I don't know for sure that this will solve your problem. Bayes still has to tokenize the message, so there is a certain amount of CPU-intensive operations that must happen. Overall, it just seems a lot more stable using a SQL database. I'm using MSSQL here because I have it and it works. Haven't had a single bayes-related problem since switching to SQL. Used to have them very often, sometimes daily. Are you sure you have enough RAM to handle the number of threads you are running? Bret
Re: bayes sync is hogging cpu
On Mon, 25 Sep 2006, Andreas Pettersson wrote: > Bret Miller wrote: > > >I used to have problems with bayes locking and journaling. When it > >finally corrupted the database, I decided it was time to put it into a > >real SQL database instead of using DB_File. Haven't had a single problem > >with bayes CPU or locking since. [snip..] > Well, if it solves the problem I'm ready to try almost anything. :) > The way you put your words tells me that the problem IS a corrupt database. > Can we be certain? And is there any way fo fix it until I can get MySQL > up 'n running? the Perl DB_File module is based upon the Berkeley DB libs. There have been issues with some versions of Berkeley DB libs, sometimes with locking. Check to see if there are updates to the Berkeley DB libs for your distro, if so install them, check for updates to DB_File module then reinstall DB_File module. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: bayes sync is hogging cpu
Jonas Eckerman wrote: Andreas Pettersson wrote: Bus error (core dumped) This *can* be the symnptom of a hardware problem, such as bad memory or a bad disk. If you have a disk thats going bad, the symptoms often are corrupt files and extremeley slow writes (because the disk controller retries the write operation (marking sections as bad) until it either succeeds or gives up). /Jonas The 'hardware' is VMware ESX 2.5 I think bad hard-hardware would show up in ESX rather than the guest OS..? But I'm not throwing any ideas away. Let me move the bayes files to another area on the disk and have a try. *momento* Same Bus error (core dumped) as before when running manual expire. When I make another try it hogs, and is still doing so after 5 minutes. But this time I'll wait at least 30 minutes, just to make sure. And just to make it clear; the spamd daemon is not running while I do manual expire. Regards, Andreas
Re: bayes sync is hogging cpu
Andreas Pettersson wrote: Bus error (core dumped) This *can* be the symnptom of a hardware problem, such as bad memory or a bad disk. If you have a disk thats going bad, the symptoms often are corrupt files and extremeley slow writes (because the disk controller retries the write operation (marking sections as bad) until it either succeeds or gives up). /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: bayes sync is hogging cpu
Bret Miller wrote: I used to have problems with bayes locking and journaling. When it finally corrupted the database, I decided it was time to put it into a real SQL database instead of using DB_File. Haven't had a single problem with bayes CPU or locking since. Maybe it's time you consider using MySQL? Bret Well, if it solves the problem I'm ready to try almost anything. :) The way you put your words tells me that the problem IS a corrupt database. Can we be certain? And is there any way fo fix it until I can get MySQL up 'n running? Best regards, Andreas
RE: bayes sync is hogging cpu
> Me again. Since I'm not getting any responses I better keep > posting more > information as I've made some more investigating today. > > Sometimes when I run sa-learn --force-expire I get this > response almost immediately: > Bus error (core dumped) > When I run again the process just hogs until I break it after > about 15 minutes. I used to have problems with bayes locking and journaling. When it finally corrupted the database, I decided it was time to put it into a real SQL database instead of using DB_File. Haven't had a single problem with bayes CPU or locking since. Maybe it's time you consider using MySQL? Bret
Re: bayes sync is hogging cpu
Here's an interesting observation. I set bayes_auto_expire to 0 as a temporary solution, I thought, and restarted spamd. The hogging occurs at least as often as before. Am I looking in the wrong direction or wouldn't this have helped something? Another observation: # sa-learn --dump magic: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: Interrupted system call 0.000 0 3 0 non-token data: bayes db version 0.000 0 437041 0 non-token data: nspam 0.000 0 253396 0 non-token data: nham 0.000 04616765 0 non-token data: ntokens 0.000 0 1156977303 0 non-token data: oldest atime 0.000 0 1159200779 0 non-token data: newest atime 0.000 0 1159199860 0 non-token data: last journal sync atime 0.000 0 1158904222 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count last expiry atime converts to september 22, the same day my problems started. But if the hogging continues even with bayes_auto_expire set to 0, then where should I be looking instead? Regards, Andreas Andreas Pettersson wrote: Me again. Since I'm not getting any responses I better keep posting more information as I've made some more investigating today. Sometimes when I run sa-learn --force-expire I get this response almost immediately: Bus error (core dumped) When I run again the process just hogs until I break it after about 15 minutes. I have also changed bayes_learn_to_journal back to 0 and lock_method to flock. Now I get these in spamd.log: Mon Sep 25 17:05:18 2006 [8853] warn: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: Interrupted system call I also lowered --max-children from 8 to 6 with this result: Mon Sep 25 17:11:03 2006 [6702] info: prefork: server reached --max-children setting, consider raising it Here's some top output of a typical situation: PID USERNAME PRI NICE SIZERES STATETIME WCPUCPU COMMAND 8287 spamd1320 48056K 44220K RUN 8:00 88.43% 88.43% perl5.8.7 8853 spamd 200 40416K 38356K lockf0:11 1.32% 1.32% perl5.8.7 9128 spamd 200 38592K 36544K lockf0:03 0.63% 0.63% perl5.8.7 8879 spamd 200 40804K 38484K lockf0:08 0.59% 0.59% perl5.8.7 9103 spamd 200 39728K 37736K lockf0:04 0.54% 0.54% perl5.8.7 -rw--- 1 spamd wheel45 Sep 25 17:04 bayes.mutex -rw--- 1 spamd wheel240024 Sep 25 17:15 bayes_journal -rw--- 1 spamd wheel 1039920 Sep 25 17:04 bayes_journal.old -rw-r--r-- 1 spamd wheel 83787776 Sep 25 16:09 bayes_seen -rw--- 1 spamd wheel 85901312 Sep 25 17:04 bayes_toks # cat bayes.mutex 8287 6708 6708 6708 6708 6708 6708 6708 6708 What is wrong?! What is making spamd go *kaboom* several times an hour? Is it something with expiring tokens that's not working correctly? Is it normal to have an bayes_journal.old laying around? What more can I do to find the cause? If the core dump (22 MB) is of any interrest, I'll upload it somewhere. Best regards, Andreas Andreas Pettersson wrote: Ok, more information here. I found in spamd.log this line when the problem started: Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: child processing timeout at /usr/local/bin/spamd line 1082 which was followed by lots of these: Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: File exists In an attempt to find what's wrong I changed bayes_learn_to_journal to 1. It didn't help, but at least I got rid of the 'lock failed: File exist' error messages in spamd.log and bayes also keeps working. For the moment I have a script that checks for bayes.lock existance and kills the hogging process and removes the lock file. It runs every minute.. I have tried change lock_method to flock, problem still there (but with a new lock file name). I also tried a sa-learn --force-expire. It took about 30 sec to complete. It didn't solve my problem either. Any ideas of what might be wrong? Regards, Andreas
Re: bayes sync is hogging cpu
Me again. Since I'm not getting any responses I better keep posting more information as I've made some more investigating today. Sometimes when I run sa-learn --force-expire I get this response almost immediately: Bus error (core dumped) When I run again the process just hogs until I break it after about 15 minutes. I have also changed bayes_learn_to_journal back to 0 and lock_method to flock. Now I get these in spamd.log: Mon Sep 25 17:05:18 2006 [8853] warn: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: Interrupted system call I also lowered --max-children from 8 to 6 with this result: Mon Sep 25 17:11:03 2006 [6702] info: prefork: server reached --max-children setting, consider raising it Here's some top output of a typical situation: PID USERNAME PRI NICE SIZERES STATETIME WCPUCPU COMMAND 8287 spamd1320 48056K 44220K RUN 8:00 88.43% 88.43% perl5.8.7 8853 spamd 200 40416K 38356K lockf0:11 1.32% 1.32% perl5.8.7 9128 spamd 200 38592K 36544K lockf0:03 0.63% 0.63% perl5.8.7 8879 spamd 200 40804K 38484K lockf0:08 0.59% 0.59% perl5.8.7 9103 spamd 200 39728K 37736K lockf0:04 0.54% 0.54% perl5.8.7 -rw--- 1 spamd wheel45 Sep 25 17:04 bayes.mutex -rw--- 1 spamd wheel240024 Sep 25 17:15 bayes_journal -rw--- 1 spamd wheel 1039920 Sep 25 17:04 bayes_journal.old -rw-r--r-- 1 spamd wheel 83787776 Sep 25 16:09 bayes_seen -rw--- 1 spamd wheel 85901312 Sep 25 17:04 bayes_toks # cat bayes.mutex 8287 6708 6708 6708 6708 6708 6708 6708 6708 What is wrong?! What is making spamd go *kaboom* several times an hour? Is it something with expiring tokens that's not working correctly? Is it normal to have an bayes_journal.old laying around? What more can I do to find the cause? If the core dump (22 MB) is of any interrest, I'll upload it somewhere. Best regards, Andreas Andreas Pettersson wrote: Ok, more information here. I found in spamd.log this line when the problem started: Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: child processing timeout at /usr/local/bin/spamd line 1082 which was followed by lots of these: Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: File exists In an attempt to find what's wrong I changed bayes_learn_to_journal to 1. It didn't help, but at least I got rid of the 'lock failed: File exist' error messages in spamd.log and bayes also keeps working. For the moment I have a script that checks for bayes.lock existance and kills the hogging process and removes the lock file. It runs every minute.. I have tried change lock_method to flock, problem still there (but with a new lock file name). I also tried a sa-learn --force-expire. It took about 30 sec to complete. It didn't solve my problem either. Any ideas of what might be wrong? Regards, Andreas
Re: bayes sync is hogging cpu
Ok, more information here. I found in spamd.log this line when the problem started: Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: child processing timeout at /usr/local/bin/spamd line 1082 which was followed by lots of these: Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: File exists In an attempt to find what's wrong I changed bayes_learn_to_journal to 1. It didn't help, but at least I got rid of the 'lock failed: File exist' error messages in spamd.log and bayes also keeps working. For the moment I have a script that checks for bayes.lock existance and kills the hogging process and removes the lock file. It runs every minute.. I have tried change lock_method to flock, problem still there (but with a new lock file name). I also tried a sa-learn --force-expire. It took about 30 sec to complete. It didn't solve my problem either. Any ideas of what might be wrong? Regards, Andreas
Re: bayes sync is hogging cpu (was: Some mail seems to hog spamd process)
Hi, me again ;) I'm pretty confident that the hogging occurs when SA is trying to sync the bayes. The bayes_journal is cleared exactly when the hogging begins. And when I run sa-learn --sync I get the very same hogging effect. The permissions seems ok, doesn't it? -rw--- 1 spamd wheel20 Sep 23 13:28 bayes.lock -rw--- 1 spamd wheel 2760 Sep 23 13:28 bayes_journal -rw-r--r-- 1 spamd wheel 83755008 Sep 23 13:28 bayes_seen -rw--- 1 spamd wheel 83853312 Sep 23 13:28 bayes_toks Regards, Andreas