Re: Running spamassassin in two-pass

2006-01-16 Thread Nicolas Boullis
Randal, Phil wrote:
> You might want to save yourself effort in reinventing the wheel and take
> a look at MaliScanner 4.50.x which caches spamassassin scores
> (http://www.mailscanner.info).

Thanks for the information. I had a look at the website but could not
find much information. Does MailScanner allow more that simply caching
the score of a message? Does it actually allow different users to use
different Bayesian database, different scoring (some users may want to
disable some tests by assigning them a 0 score), and still perform the
common tests only once? I'm curious to know how it performs that...


Cheers,

Nicolas


Running spamassassin in two-pass

2006-01-16 Thread Nicolas Boullis
Hi,

I think that some part of spamassassin is highly user-specific (such as
BAYES, AWL or UNWANTED_LANGUAGE_BODY). But I receive some e-mails that
are sent to hundreds of our users, and I consider it is some waste of
CPU-time and bandwidth to run the same tests on the same message
hundreds of times.

So I considered running spammassassin in 2-pass:
  - 1 pass on our MX server, that runs most tests, and report in some
header which tests were run and which were triggered;
  - 1 pass on the server that hosts the mailboxes, that read those
headers added by the MX, runs the user-specific tests and computes
the scores, does the AWL and BAYES learning and adds the required
headers.

What do you people think about this idea? Does it sound sane? Would you
expect much improvement over a full single pass on the server that hosts
the mailboxes?

(As far as I am concerned, I used my very limited knowledge of perl to
try to implement this 2-pass idea, but the improvement looks very
limited...)


Nicolas Boullis