Bayes dbm sync/expire speedup suggestion
For the past several months I have been trying to find a way to make maintaining the SpamAssassin bayes database more effective on our SA servers. We have several SA servers, all running bayes globally on the server, not per user. Bayes generally does a good job but on a fairly busy server bayes can be less effective based on how you set the database up to learn/expire, etc. So far I've followed just about every suggestion on trying to effectively maintain bayes, while bayes still works, it's not without some major problems mainly the one being when large syncs happen, the bayes token database can be locked out from other SA children for up to 10 minutes per sync. Basically we have setup our servers for learn to journal and we sync the journal to the main bayes database about once an hour. We've found that this process can take 8 to 10 minutes, give or take. We recently moved the bayes database into a RAM disk to see if that would help, and while reads/seeks have sped up considerably, sync has not. Expire does not seem to be a problem. Correct me if I'm wrong, but when you have bayes_learn_to_journal enabled and then you run a sync, sa-learn basically moves bayes_journal to bayes_journal.old and then starts merging/adding tokens into bayes_toks. When this happens, bayes_toks is locked for the entire time until the sync completes. So that, for us means the bayes database is locked for about 10 minutes an hour. Expires do not seem to run that long. In fact, expires finish about a minute.. which is acceptable. Would it make more sense that when you do a learn_to_journal and a sync to make a copy of the bayes_toks database, say to bayes_toks.new and merge/add tokens from the journal to that? Then, once the sync is complete you can lock and copy the .new to the current and continue. This should only lockout the database from updates for only seconds (if that) rather than locking it out during the entire learn/add process. I assume an expire could actually use the same logic for those of us using manually running expire/sync in cron and periodically rather than via auto methods. Thoughts? I guess my thought is to keep a read only version of bayes_toks at almost the whole time avoiding any lock contentions from the database being synced/expired. Our current bayes config: use_bayes1 bayes_auto_learn 1 bayes_auto_expire0 bayes_learn_to_journal 1 bayes_journal_max_size 0 bayes_expiry_max_db_size 100 lock_method flock SA 3.3.1 on FreeBSD 6.4 Perl 5.10 -- Robert Blayzor INOC, LLC rblay...@inoc.net http://www.inoc.net/~rblayzor/
Re: Bayes dbm sync/expire speedup suggestion
On Nov 1, 2010, at 1:54 PM, Michael Scheidell wrote: then you will probably always have delays. Hence my suggestion for making copies of the database to be worked on during the sync/expire process. Then there should be virtually no delay other than lock/copy which should be virtually seconds instead of several minutes. -- Robert Blayzor INOC, LLC rblay...@inoc.net http://www.inoc.net/~rblayzor/
Re: Using Pzyor with high volume
On May 1, 2008, at 10:02 PM, Michael Hutchinson wrote: Anyway, just thought you ought to know about the high volume thing. You might get your end running sweet and fast, but it may cause rejected lookups when you're scanning mail. I'm pretty much putting Pyzor on the back burner for now. Even with the ReadyExec method, I don't want to call an exec over NFS constantly... it's expensive on a large scale. I could do something like create a memory disk and exec out of that, but it's just to much cobbing up. I really hoped that something could be memory resident, ie: just loaded at start time, then just work. Both dcc and razor2 both seem to be doing a good job now. -- Robert Blayzor, BOFH INOC, LLC [EMAIL PROTECTED] http://www.inoc.net/~rblayzor/ Mac OS X. Because making Unix user-friendly is easier than debugging Windows.
Using Pzyor with high volume
In regards to Pyzor. I'm wondering if anyone out there is using this at any large scale. Unlike the razor-agent which appears to be a Perl module that gets loaded at startup, I'm concerned about SA having to exec the python interpreter and having that setup/teardown time for each and every message. Adding salt to the wound, our SA servers run on diskless servers; so having it have to run over NFS makes for a double whammy. Is there a better way to implement Pyzor or is it not even worth the trouble? TIA -- Robert Blayzor, BOFH INOC, LLC [EMAIL PROTECTED] http://www.inoc.net/~rblayzor/ Mac OS X. Because making Unix user-friendly is easier than debugging Windows.
Re: Using Pzyor with high volume
On Apr 30, 2008, at 11:59 AM, Ben Poliakoff wrote: Seems to be just the sort of thing to address your concern (short of a perl implementation of the pyzor client). I should note that *I* haven't used the ReadyExec stuff in my environment [1] (where executing the pyzor client hasn't been much of a resource drain), but I've thought about it. Yeah, I did run over this, but haven't had to much experience in installing/maintaining that. That's why I'm trying to weigh the value of Pyzor vs. having to complicate the installation any more. A Perl agent of Pyzor would be ideal. [1] My environment supports about 2000 users scanning roughly 45000 - 7/day currently spread across two older linux boxes. My setup is over 10X that, which is why this is a concern! ;-) -- Robert Blayzor, BOFH INOC, LLC [EMAIL PROTECTED] http://www.inoc.net/~rblayzor/ Mac OS X. Because making Unix user-friendly is easier than debugging Windows.
Re: Using Pzyor with high volume
On Apr 30, 2008, at 2:04 PM, Jason J. Ellingson wrote: Yup... I got the server portion running... The trick now is to get SpamAssassin to use readyexec /tmp/pyzor instead of just pyzor... Any suggestions? I was looking at modifying Pyzor.pm in the SpamAssassin perl directory. My guess.. pyzor_path STRING This option tells SpamAssassin specifically where to find the pyzor client instead of relying on SpamAssassin to find it in the current PATH. Note that if taint mode is enabled in the Perl interpreter, you should use this, as the current PATH will have been cleared. So... pyzor_path readyexec --stop /tmp/pyzor May work... Even though ready exec is more lightweight than actually calling python each time, I'm still hoping that a non exec based plugin can appear someday. (again, if it's worth the trouble to do so). -- Robert Blayzor, BOFH INOC, LLC [EMAIL PROTECTED] http://www.inoc.net/~rblayzor/ Mac OS X. Because making Unix user-friendly is easier than debugging Windows.
Re: Global Bayes
On Mar 24, 2008, at 11:08 AM, Mike Fahey wrote: Just upgraded to 3.2.4. I am running spamassasin as a normal user, not root. I keep seeing this in the log files. bayes: cannot open bayes databases /var/sabayes/.spamassassin/ bayes_* R/W: lock failed: File exists There are about 20 lock files in the directory. Is spamassassin not cleaning up the lock files properly or is it even working? Looks like there have been changes here since version 3.2.0 I don't know of any specific changes between versions, but... whenever I noticed this happen it was almost always due to disk space, permissions or the fact you have autoexpire turned on. Double check the permissions on your folders and make sure the user you run SpamAssassin under has the right privs required. Stop SA, clean up the files, and try restarting. A good idea (if you're running global bayes) is to turn off auto- expire and run a sa-learn force expire at a normal interval. We've been running this way for years and it seems to perform just fine under 3.2.4. -- Robert Blayzor INOC [EMAIL PROTECTED] http://www.inoc.net/~rblayzor/ Mac OS X. Because making Unix user-friendly is easier than debugging Windows.
Re: SPF is hopelessly broken and must die!
Marc Perkel wrote: SPF catches no spam - but does create false positives. It's less than useless. It's dangerous. SPF's job is not to catch spam, period! No matter how many times you claim it's supposed to catch spam, you could never be more wrong. It's sole purpose is to allow domain owners to publish valid mail sources for their domains. That's it's *only* purpose. How and what you decide to do with that published information from the TXT records is totally up to the receiver. -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: 0x66F90BFC @ http://pgp.mit.edu Key fingerprint = 6296 F715 038B 44C1 2720 292A 8580 500E 66F9 0BFC State-of-the-art: What we could do with enough money.
Re: SPF is hopelessly broken and must die!
Marc Perkel wrote: From openspf.org http://old.openspf.org/aspen.html Also from the SPF FAQ: Sender Policy Framework (SPF) is an attempt to control forged e-mail. SPF is not directly about stopping spam – junk email. It is about giving domain owners a way to say which mail sources are legitimate for their domain and which ones aren't. While not all spam is forged, virtually all forgeries are spam. SPF is not anti-spam in the same way that flour is not food: it is part of the solution. -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: 0x66F90BFC @ http://pgp.mit.edu Key fingerprint = 6296 F715 038B 44C1 2720 292A 8580 500E 66F9 0BFC You are in a dark room with a compiler, vi, an internet connection, and a thermos of coffee. :Your Move ?
Re: SPF is hopelessly broken and must die!
Marc Perkel wrote: SPF is not anti-spam in the same way that flour is not food: it is part of the solution. The solution - to what? SPAM! part of the solution, not the solution. Big difference. Controlling forgeries is just one step at taking one of the tools out of the tool bag. And those tool bags are pretty full, you have to start somewhere. -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: 0x66F90BFC @ http://pgp.mit.edu Key fingerprint = 6296 F715 038B 44C1 2720 292A 8580 500E 66F9 0BFC Windows NT: Insert wallet into Drive A: and press any key to empty
Re: DCC worth it?
Jeff Moss wrote: pain in the butt. In particular dealing with its log files. By default it creates thousands of them a day. There is a way to cut that down to hundreds a day by editing the configuration file. But you still have to run a cron job to keep them from eating your hard drive. Not true. You can disable logging completely in your conf file. Something to the effect of leaving the following options empty... DCCM_LOGDIR= DCCM_LOG_AT= -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: 0x66F90BFC @ http://pgp.mit.edu Key fingerprint = 6296 F715 038B 44C1 2720 292A 8580 500E 66F9 0BFC Any sufficiently advanced bug is indistinguishable from a feature. - Kulawiec
Re: Any comments of the SpamHaus lawsuit?
Fabien GARZIANO wrote: Ok. I dun want to dive into a useless political debate... But personnally I don't trust either bush team as it appears to me to be a new kind of dictatorship. But the question is not 'should the US be trusted' but 'should something like the internet be under the control of 1 country. My answer is no. What the hell does the bush team have to do with the Internet? The US government has mostly, to date, been hands off on any Internet policy. While they still reserve the authority to intervene at any time. Lets not forget where the Internet started... by American innovation. Giving control of Internet policy to an international body would be a waste of time. Just look at the record of the UN... Yeah sure, lets give control of the Internet to Russia, China and Korea... and you complain about spam now??? LOL! -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: 0x66F90BFC @ http://pgp.mit.edu Key fingerprint = 6296 F715 038B 44C1 2720 292A 8580 500E 66F9 0BFC If I had it all to do over again, I'd spell creat with an e. - Kernighan
Re: Any comments of the SpamHaus lawsuit?
Chris Santerre wrote: US products? What is that? I think the last US proiduct I purchased was an american flag. Come to think of itit might have been made somewhere else! Yeah, America RD's everything; everyone else in the world just clones it. -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: 0x66F90BFC @ http://pgp.mit.edu Key fingerprint = 6296 F715 038B 44C1 2720 292A 8580 500E 66F9 0BFC Please send all spam to my main address, [EMAIL PROTECTED]
Re: X-Spam Status
Spam Ass wrote: The only time I have run into an email not being tagged is when the email was over a certain size. I believe the default max size is 256kb. This can be changed on a per user or global basis though. Other times this will happen is when you're using spamd/spamc and a timeout occurs between the client and the server. If that happens spamc returns the original message as unscanned. If you have a high volume server environment you have to do a lot of timeout tweaking to insure most of your emails are scanned relatively quickly without deadlocking the mail server or running the spamd box out of resources. ;-) -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: 0x66F90BFC @ http://pgp.mit.edu Key fingerprint = 6296 F715 038B 44C1 2720 292A 8580 500E 66F9 0BFC Any sufficiently advanced bug is indistinguishable from a feature. - Kulawiec
Re: spamd --max-spare ignored
[EMAIL PROTECTED] wrote: I'm running spamd with --max-spare, but as soon as I start it, it spawns --max-children children and keeps it there. I'm running 3.10 with these options: /usr/bin/spamd \ --daemonize \ --username=spamd \ --round-robin \ --max-children=20 \ --max-spare=5 \ --socketpath=/var/run/spam/spamd.sock \ --pidfile=/var/run/spam/spamd.pid Are any of my settings incorrect? Or could this be a bug? Because you have specified --round-robin. That tells spamd to use the old way of forking processes. -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: http://www.inoc.net/~dev/ Key fingerprint = 1E02 DABE F989 BC03 3DF5 0E93 8D02 9D0B CB1A A7B0 A list is only as strong as its weakest link. - Don Knuth
Re: server reached --max-clients setting
JamesDR wrote: I got a lot of this messages in my maillog. Oct 4 08:27:13 server spamd[482]: prefork: server reached --max-clients setting, consider raising it Oct 4 08:27:13 server spamd[482]: prefork: server reached --max-clients setting, consider raising it Oct 4 08:28:53 server spamd[482]: prefork: server reached --max-clients setting, consider raising it Since you say your man page doesn't have it, I'll throw you a bone: -m number , --max-children=number --max-children != --max-clients. It's either a typo in the code or some stealth option somewhere. ;-) -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: http://www.inoc.net/~dev/ Key fingerprint = 1E02 DABE F989 BC03 3DF5 0E93 8D02 9D0B CB1A A7B0 Years of development: We finally got one to work.
Re: spamd children run as root (again)
Brandon Kuczenski wrote: I've seen this question posted a couple times in the mailing list archives (from October 2004) but no resolution. The question again: I'm running SpamAssassin 3.0.2 on FreeBSD 4.10 in spamc/spamd format with the '-u spamd' flag. Problem is, all the child processes are running as root: This has been a problem since 3.0.0 and I even submitted a patch in the PR... Dunno why this PR is being ignored by the devs... http://bugzilla.spamassassin.org/show_bug.cgi?id=3897 -- Robert Blayzor, BOFH INOC, LLC rblayzor\@(inoc.net|gmail.com) PGP: http://www.inoc.net/~dev/ Key fingerprint = 1E02 DABE F989 BC03 3DF5 0E93 8D02 9D0B CB1A A7B0 Pinky, you've left the lens cap of your mind on again. - The Brain