Re: spamassassin 3.3.1 for Debian Lenny

2010-04-15 Thread Ben Poliakoff
* Alessio Cecchi  [20100415 10:23]:
> 
> now we are running spamassassin 3.3.0 on debian lenny, package is
> installed from backports.
> 
> Nobody knows if it was packaged (.deb) version 3.3.1 for lenny?
> 

Version 3.3.1-1 is in Debian testing (as of 2010-04-05), but hasn't made
it to lenny-backports yet:

http://packages.qa.debian.org/s/spamassassin.html

We "backported" (rebuilt from the source package) 3.3.1-1 for lenny
without any trouble, and are using it on our production servers.  I
assume it'll end up in lenny-backports soon.

Ben

-- 

PGP (318B6A97):  3F23 EBC8 B73E 92B7 0A67  705A 8219 DCF0 318B 6A97


signature.asc
Description: Digital signature


Re: Using Pzyor with high volume

2008-04-30 Thread Ben Poliakoff
* Jason J. Ellingson <[EMAIL PROTECTED]> [20080430 13:21]:
> I am trying those settings, yet I get no Pyzor hits.
> 
> I can manually do a "readyexec /tmp/pyzor ping" which works fine...
> 
> Any other suggestions?
> 

Try running spamassassin with debug mode on (-D) look for pyzor related
stuff.

Ben

-- 

PGP fingerprint:  A131 F813 7A0F C5B7 E74D  C972 9118 A94D 6AF5 2019


pgpJ18sxT0x7a.pgp
Description: PGP signature


Re: Using Pzyor with high volume

2008-04-30 Thread Ben Poliakoff
* Jason J. Ellingson <[EMAIL PROTECTED]> [20080430 11:07]:
> Yup... I got the "server" portion running... The trick now is to get
> SpamAssassin to use "readyexec /tmp/pyzor" instead of just "pyzor"...
> Any suggestions?  I was looking at modifying Pyzor.pm in the
> SpamAssassin perl directory.

Something like this seems to work for me:

use_pyzor 1
pyzor_path /usr/local/bin/readyexec
pyzor_options /tmp/pyzor

Ben

-- 

PGP fingerprint:  A131 F813 7A0F C5B7 E74D  C972 9118 A94D 6AF5 2019


pgpqYXjn5Lnkm.pgp
Description: PGP signature


Re: Using Pzyor with high volume

2008-04-30 Thread Ben Poliakoff
* Jason J. Ellingson <[EMAIL PROTECTED]> [20080430 10:59]:
> I decided to look into this as well.
> 
> I managed to get ReadyExec installed, but am having difficulty changing
> the Pyzor.pm to find and use readyexec properly.  Anyone else have luck?
> 

This works for me:

readyexecd.py  /tmp/pyzor pyzor.client.run


This stops readyexecd.py:

readyexec --stop /tmp/pyzor

Ben
-- 

PGP fingerprint:  A131 F813 7A0F C5B7 E74D  C972 9118 A94D 6AF5 2019


pgpXgo2U89BP1.pgp
Description: PGP signature


Re: Using Pzyor with high volume

2008-04-30 Thread Ben Poliakoff
* Robert Blayzor <[EMAIL PROTECTED]> [20080430 07:46]:
> In regards to Pyzor.  I'm wondering if anyone out there is using this at 
> any large scale.  Unlike the razor-agent which appears to be a Perl module 
> that gets loaded at startup, I'm  concerned about SA having to exec the 
> python interpreter and having that setup/teardown time for each and every 
> message.
>
> Adding salt to the wound, our SA servers run on diskless servers; so having 
> it have to run over NFS makes for a double whammy.
>
> Is there a better way to implement Pyzor or is it not even worth the 
> trouble?
>

Looking at the pyzor man page I've noted that pyzor can be made to run
with "ReadyExec":

ReadyExec is a system to eliminate the high startup-cost of
executing scripts repeatedly. If you execute pyzor a lot, you might
be interested in installing ReadyExec and using it with pyzor.

Seems to be just the sort of thing to address your concern (short of
a perl implementation of the pyzor client).  I should note that *I*
haven't used the ReadyExec stuff in my environment [1] (where executing
the pyzor client hasn't been much of a resource drain), but I've thought
about it.

[1] My environment supports about 2000 users scanning roughly 45000 -
7/day currently spread across two older linux boxes.

-- 

PGP fingerprint:  A131 F813 7A0F C5B7 E74D C972 9118 A94D 6AF5 2019


pgpb5lrSS3FlU.pgp
Description: PGP signature


Re: Pyzor and cloudmark

2008-03-13 Thread Ben Poliakoff
* Matus UHLAR - fantomas <[EMAIL PROTECTED]> [20080313 07:59]:
> 
> > > Is anyone using pyzor ?
> > 
> > both server and client here yes
> 
> looking at it now, I got no PYZOR catches last day :(

FWIW, at our site PYZOR_CHECK fires on about 65% of all of our spam. We
had a total of 7523 hits for PYZOR_CHECK yesterday, just behind
RAZOR2_CHECK and BAYES_99.  We use local pyzor servers as well as the
"alternate" public server.

While the pyzor code base hasn't been actively updated lately I think
it's still a very valuable component in a SpamAssassin installation.

Ben
-- 

PGP fingerprint:  A131 F813 7A0F C5B7 E74D  C972 9118 A94D 6AF5 2019


pgpYbi6tsZq89.pgp
Description: PGP signature


Re: URIDNSBL.pm improvements in 3.1?

2005-06-03 Thread Ben Poliakoff
* Stuart Johnston <[EMAIL PROTECTED]> [20050603 11:09]:
> >Is there any straightforward way to backport some of this goodness to
> >3.0.x?  I don't mind running the development snapshots at home but at
> >work I have to answer to a couple thousand users...
> 
> Here is the bug concerning the copy-paste urls:
> 
> http://bugzilla.spamassassin.org/show_bug.cgi?id=4208
> 
> I have just posted a backport patch there.  I doubt that it will get 
> added to 3.0.4 (if there ever is one) but you should be able to apply it 
> to your local install.  Although, I should point out that this backport 
> has not been tested any further than 'make test'.

Thanks!  This is very useful, I'm testing it now, it's caught all of the
problem messages so far.

Looking forward to the 3.1 release! :)

Ben


URIDNSBL.pm improvements in 3.1?

2005-06-02 Thread Ben Poliakoff
So I've noticed that the URIDNSBL.pm in the 3.1 snapshots seems to
recognize obfuscated URIs much better than in 3.0.x.  

In other words I was looking at a message that my relatively well
maintained 3.0.3 installation didn't catch.  Then I tried running the
same message through my personal 3.1 snapshot installation.  The 3.1
installation gave the message a comparatively high score (do to the
domain being listed in multiple SURBLs).

The message in question contained some lines like this:

copy-paste the u[r]l to finish.
ez-rate*MUNGED*.info

The 3.1 code recoginized the domain name readily, looked it up and found
it in almost all of the SURBLs.  But the 3.0.3 code didn't spot it (and
the message scored on bayes alone).

Is there any straightforward way to backport some of this goodness to
3.0.x?  I don't mind running the development snapshots at home but at
work I have to answer to a couple thousand users...

Ben


highly available sitewide bayes, local db vs. sql

2005-02-24 Thread Ben Poliakoff
What sort of experiences have people had managing a sitewide bayes db
that is used by spamassassin (spamd|amavisd) instances on multiple
machines?  I've got an environment with spamassassin/amavisd-new running
in parallel on a pool of two (but possibly more in the future) equally
weighted machines.  How have you avoided the dreaded Single Point of
Failure?

I've been experimenting (on a small scale) with an SQL backed bayes db.
I can readily have multiple machines talk to single mysql instance, but
then I'm stuck trying to make that mysql instance "highly available"
(and I *could* do that on an existing "clustered" server).

I could also have an instance of mysql running on all of the machines,
with one master mysql instance replicating to one or more mysql slave
instances.  I've never set up mysql replication (but it can't be much
harder than OpenLDAP replication!).  In such an example I'd only enable
autolearning on the machine with the master mysql db.

I could also ditch the idea of using a mysql backed bayes and simply
rsync the bayes db file from the master to the slaves on a regular basis
(stopping and starting spamd|amavisd in the process).  In such an
environment I'd do training only on one "master" machine and enable
autolearning only on that machine.

How are other people addressing this issue?

Ben


Re: bayes_expiry_max_db_size setting for sitewide installation?

2005-02-23 Thread Ben Poliakoff
* Ben Poliakoff <[EMAIL PROTECTED]> [20050223 11:46]:
> What sort of guidelines/rules of thumb/formulas have people used to
> determine the bayes_expiry_max_db_size setting for a sitewide bayes
> database?

Thanks Matt, Kris, and Kai,

Very useful comments all around.  I now have some reasonable numbers
(tokens and overall db size) to kick around.  I also will reconsider
using a "dbx" backed bayes.  

Ben


bayes_expiry_max_db_size setting for sitewide installation?

2005-02-23 Thread Ben Poliakoff
What sort of guidelines/rules of thumb/formulas have people used to
determine the bayes_expiry_max_db_size setting for a sitewide bayes
database?

The Mail::SpamAssassin::Conf man page says the default is 15 tokens
(which, it says, is equivalent to roughly 8mb).  It seems a little
extreme to simply multiply that number by the number of users on the
server.

8Mb * 2000 users = ~16Gb!

I'm planning on hosting this db in mysql (an SQL based bayes seems
better suited than the default "file based" option for a sitewide DB),
but clearly 16Gb is just too big  Presumably something smaller would
work well enough, but how small is too small and how big is too big?

The only advice I've found in the list archives is:

http://marc.theaimsgroup.com/?l=spamassassin-users&m=109033803207027&w=2

> How big are your bayes_* files on disk?  I would say personally
> that a single-user set of Bayes files shouldn't be much more than
> 8-10M total; a medium-size site Bayes should be ~40M _toks +
> whatever _seen takes up; and a large sitewide Bayes may run up to
> ~100M.  I wouldn't go much higher due to the IO/memory/filesystem
> cache load.

Thanks!

Ben