Re: spamassassin 3.3.1 for Debian Lenny
* Alessio Cecchi [20100415 10:23]: > > now we are running spamassassin 3.3.0 on debian lenny, package is > installed from backports. > > Nobody knows if it was packaged (.deb) version 3.3.1 for lenny? > Version 3.3.1-1 is in Debian testing (as of 2010-04-05), but hasn't made it to lenny-backports yet: http://packages.qa.debian.org/s/spamassassin.html We "backported" (rebuilt from the source package) 3.3.1-1 for lenny without any trouble, and are using it on our production servers. I assume it'll end up in lenny-backports soon. Ben -- PGP (318B6A97): 3F23 EBC8 B73E 92B7 0A67 705A 8219 DCF0 318B 6A97 signature.asc Description: Digital signature
Re: Using Pzyor with high volume
* Jason J. Ellingson <[EMAIL PROTECTED]> [20080430 13:21]: > I am trying those settings, yet I get no Pyzor hits. > > I can manually do a "readyexec /tmp/pyzor ping" which works fine... > > Any other suggestions? > Try running spamassassin with debug mode on (-D) look for pyzor related stuff. Ben -- PGP fingerprint: A131 F813 7A0F C5B7 E74D C972 9118 A94D 6AF5 2019 pgpJ18sxT0x7a.pgp Description: PGP signature
Re: Using Pzyor with high volume
* Jason J. Ellingson <[EMAIL PROTECTED]> [20080430 11:07]: > Yup... I got the "server" portion running... The trick now is to get > SpamAssassin to use "readyexec /tmp/pyzor" instead of just "pyzor"... > Any suggestions? I was looking at modifying Pyzor.pm in the > SpamAssassin perl directory. Something like this seems to work for me: use_pyzor 1 pyzor_path /usr/local/bin/readyexec pyzor_options /tmp/pyzor Ben -- PGP fingerprint: A131 F813 7A0F C5B7 E74D C972 9118 A94D 6AF5 2019 pgpqYXjn5Lnkm.pgp Description: PGP signature
Re: Using Pzyor with high volume
* Jason J. Ellingson <[EMAIL PROTECTED]> [20080430 10:59]: > I decided to look into this as well. > > I managed to get ReadyExec installed, but am having difficulty changing > the Pyzor.pm to find and use readyexec properly. Anyone else have luck? > This works for me: readyexecd.py /tmp/pyzor pyzor.client.run This stops readyexecd.py: readyexec --stop /tmp/pyzor Ben -- PGP fingerprint: A131 F813 7A0F C5B7 E74D C972 9118 A94D 6AF5 2019 pgpXgo2U89BP1.pgp Description: PGP signature
Re: Using Pzyor with high volume
* Robert Blayzor <[EMAIL PROTECTED]> [20080430 07:46]: > In regards to Pyzor. I'm wondering if anyone out there is using this at > any large scale. Unlike the razor-agent which appears to be a Perl module > that gets loaded at startup, I'm concerned about SA having to exec the > python interpreter and having that setup/teardown time for each and every > message. > > Adding salt to the wound, our SA servers run on diskless servers; so having > it have to run over NFS makes for a double whammy. > > Is there a better way to implement Pyzor or is it not even worth the > trouble? > Looking at the pyzor man page I've noted that pyzor can be made to run with "ReadyExec": ReadyExec is a system to eliminate the high startup-cost of executing scripts repeatedly. If you execute pyzor a lot, you might be interested in installing ReadyExec and using it with pyzor. Seems to be just the sort of thing to address your concern (short of a perl implementation of the pyzor client). I should note that *I* haven't used the ReadyExec stuff in my environment [1] (where executing the pyzor client hasn't been much of a resource drain), but I've thought about it. [1] My environment supports about 2000 users scanning roughly 45000 - 7/day currently spread across two older linux boxes. -- PGP fingerprint: A131 F813 7A0F C5B7 E74D C972 9118 A94D 6AF5 2019 pgpb5lrSS3FlU.pgp Description: PGP signature
Re: Pyzor and cloudmark
* Matus UHLAR - fantomas <[EMAIL PROTECTED]> [20080313 07:59]: > > > > Is anyone using pyzor ? > > > > both server and client here yes > > looking at it now, I got no PYZOR catches last day :( FWIW, at our site PYZOR_CHECK fires on about 65% of all of our spam. We had a total of 7523 hits for PYZOR_CHECK yesterday, just behind RAZOR2_CHECK and BAYES_99. We use local pyzor servers as well as the "alternate" public server. While the pyzor code base hasn't been actively updated lately I think it's still a very valuable component in a SpamAssassin installation. Ben -- PGP fingerprint: A131 F813 7A0F C5B7 E74D C972 9118 A94D 6AF5 2019 pgpYbi6tsZq89.pgp Description: PGP signature
Re: URIDNSBL.pm improvements in 3.1?
* Stuart Johnston <[EMAIL PROTECTED]> [20050603 11:09]: > >Is there any straightforward way to backport some of this goodness to > >3.0.x? I don't mind running the development snapshots at home but at > >work I have to answer to a couple thousand users... > > Here is the bug concerning the copy-paste urls: > > http://bugzilla.spamassassin.org/show_bug.cgi?id=4208 > > I have just posted a backport patch there. I doubt that it will get > added to 3.0.4 (if there ever is one) but you should be able to apply it > to your local install. Although, I should point out that this backport > has not been tested any further than 'make test'. Thanks! This is very useful, I'm testing it now, it's caught all of the problem messages so far. Looking forward to the 3.1 release! :) Ben
URIDNSBL.pm improvements in 3.1?
So I've noticed that the URIDNSBL.pm in the 3.1 snapshots seems to recognize obfuscated URIs much better than in 3.0.x. In other words I was looking at a message that my relatively well maintained 3.0.3 installation didn't catch. Then I tried running the same message through my personal 3.1 snapshot installation. The 3.1 installation gave the message a comparatively high score (do to the domain being listed in multiple SURBLs). The message in question contained some lines like this: copy-paste the u[r]l to finish. ez-rate*MUNGED*.info The 3.1 code recoginized the domain name readily, looked it up and found it in almost all of the SURBLs. But the 3.0.3 code didn't spot it (and the message scored on bayes alone). Is there any straightforward way to backport some of this goodness to 3.0.x? I don't mind running the development snapshots at home but at work I have to answer to a couple thousand users... Ben
highly available sitewide bayes, local db vs. sql
What sort of experiences have people had managing a sitewide bayes db that is used by spamassassin (spamd|amavisd) instances on multiple machines? I've got an environment with spamassassin/amavisd-new running in parallel on a pool of two (but possibly more in the future) equally weighted machines. How have you avoided the dreaded Single Point of Failure? I've been experimenting (on a small scale) with an SQL backed bayes db. I can readily have multiple machines talk to single mysql instance, but then I'm stuck trying to make that mysql instance "highly available" (and I *could* do that on an existing "clustered" server). I could also have an instance of mysql running on all of the machines, with one master mysql instance replicating to one or more mysql slave instances. I've never set up mysql replication (but it can't be much harder than OpenLDAP replication!). In such an example I'd only enable autolearning on the machine with the master mysql db. I could also ditch the idea of using a mysql backed bayes and simply rsync the bayes db file from the master to the slaves on a regular basis (stopping and starting spamd|amavisd in the process). In such an environment I'd do training only on one "master" machine and enable autolearning only on that machine. How are other people addressing this issue? Ben
Re: bayes_expiry_max_db_size setting for sitewide installation?
* Ben Poliakoff <[EMAIL PROTECTED]> [20050223 11:46]: > What sort of guidelines/rules of thumb/formulas have people used to > determine the bayes_expiry_max_db_size setting for a sitewide bayes > database? Thanks Matt, Kris, and Kai, Very useful comments all around. I now have some reasonable numbers (tokens and overall db size) to kick around. I also will reconsider using a "dbx" backed bayes. Ben
bayes_expiry_max_db_size setting for sitewide installation?
What sort of guidelines/rules of thumb/formulas have people used to determine the bayes_expiry_max_db_size setting for a sitewide bayes database? The Mail::SpamAssassin::Conf man page says the default is 15 tokens (which, it says, is equivalent to roughly 8mb). It seems a little extreme to simply multiply that number by the number of users on the server. 8Mb * 2000 users = ~16Gb! I'm planning on hosting this db in mysql (an SQL based bayes seems better suited than the default "file based" option for a sitewide DB), but clearly 16Gb is just too big Presumably something smaller would work well enough, but how small is too small and how big is too big? The only advice I've found in the list archives is: http://marc.theaimsgroup.com/?l=spamassassin-users&m=109033803207027&w=2 > How big are your bayes_* files on disk? I would say personally > that a single-user set of Bayes files shouldn't be much more than > 8-10M total; a medium-size site Bayes should be ~40M _toks + > whatever _seen takes up; and a large sitewide Bayes may run up to > ~100M. I wouldn't go much higher due to the IO/memory/filesystem > cache load. Thanks! Ben