Re: Clustering spamassassin + autolearning

2008-11-26 Thread Peter Fastré
Thank you all for your (quick) answers!
@Kai: mailwatch has a training facility built in. But this is only possible
on messages in quarantine. If a message is passed by mailscanner (for
example, because of BAYES_00, which is sometimes the case), it is sent to
the mailbox server, and it's not possible to train the message as spam on
the mailwatch server.

Peter


On Tue, Nov 25, 2008 at 9:11 PM, Kai Schaetzl [EMAIL PROTECTED]wrote:

 Peter Fastré wrote on Tue, 25 Nov 2008 16:04:19 +0100:

  2. On my mailbox server I'd like to have a script which goes into the
  mailfolders, searches for a folder with the name 'Spam', feeds the
 message
  to sa-learn (which should be feeding it to the same bayes database of
  course), and then delete the message. Do you think this is a well-thought
  approach of having my users train the spam filters this way?

 Generally yes, but since you are already using MailScanner+Mailwatch:
 That's
 already built-in and users can just train any messages from MailWatch. Why
 duplicate that?

 Kai

 --
 Kai Schätzl, Berlin, Germany
 Get your web at Conactive Internet Services: http://www.conactive.com






Clustering spamassassin + autolearning

2008-11-25 Thread Peter Fastré
Hello guys,

I'm running a small-sized hosting provider and currently our setup is
following:

1 mailbox server running exim (only local delivery)
1 antispam server running exim + mailscanner + spamassassin + mailwatch -
sends all approved mail to mailbox server
2 mysql servers (master-slave) running databases for mailwatch +
spamassassin (bayes)

I have two questions about this, hope someone can help me.

1. Because all the load goes to the smtp server now and to add some
redundancy to our setup, we would like to add another antispam server, with
the same setup (which is working fine). Will this be possible (concerning
spamassassin), with two nodes sharing the same bayes database on the mysql
servers? Is it possible to have two nodes feeding the same bayes database?

2. On my mailbox server I'd like to have a script which goes into the
mailfolders, searches for a folder with the name 'Spam', feeds the message
to sa-learn (which should be feeding it to the same bayes database of
course), and then delete the message. Do you think this is a well-thought
approach of having my users train the spam filters this way? Maybe there are
already such scripts available?

Thanks in advance,

Peter


Re: Clustering spamassassin + autolearning

2008-11-25 Thread Benny Pedersen

On Tue, November 25, 2008 16:04, Peter Fastré wrote:

 approach of having my users train the spam filters this way? Maybe there are
 already such scripts available?

http://johannes.sipsolutions.net/Projects/dovecot-antispam
http://dovecot.org/ and full enabled with sieve / managesieve
http://sieve.info/ more info on what sieve is

dovecot-antispam can use sa-learn if you like that pr msg, this way it works
in outlook aswell as its handled as a imap hook

sorry you did not post what lda you have but all the rest was there

-- 
Benny Pedersen
Need more webspace ? http://www.servage.net/?coupon=cust37098



Re: Clustering spamassassin + autolearning

2008-11-25 Thread Samy Ascha, Xel Media B.V.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hey Peter,

I have been working on this kind of setup last week.

On Nov 25, 2008, at 4:04 PM, Peter Fastré wrote:


Hello guys,

I'm running a small-sized hosting provider and currently our setup  
is following:


1 mailbox server running exim (only local delivery)
1 antispam server running exim + mailscanner + spamassassin +  
mailwatch - sends all approved mail to mailbox server
2 mysql servers (master-slave) running databases for mailwatch +  
spamassassin (bayes)


I have two questions about this, hope someone can help me.

1. Because all the load goes to the smtp server now and to add some  
redundancy to our setup, we would like to add another antispam  
server, with the same setup (which is working fine). Will this be  
possible (concerning spamassassin), with two nodes sharing the same  
bayes database on the mysql servers? Is it possible to have two  
nodes feeding the same bayes database?


I'm not 100% sure, but since MySQL is ACID compliant, it should be  
very possible to autolearn from multiple locations to one central  
database. This is the setup I have made too. If you have multiple  
database servers and both are used by scan hosts, make sure one of  
them replicates the bayes stuff from the other, which is fed by sa- 
learn. Afaik, when using a SQL database for bayes, no important bayes  
stuff is stored on the host, so there's nothing that can get out of  
sync.





2. On my mailbox server I'd like to have a script which goes into  
the mailfolders, searches for a folder with the name 'Spam', feeds  
the message to sa-learn (which should be feeding it to the same  
bayes database of course), and then delete the message. Do you think  
this is a well-thought approach of having my users train the spam  
filters this way? Maybe there are already such scripts available?


With the database already setup, I have made a IMAP box with some dirs  
(Ham, Spam, Archive/Ham, Archive/Spam). The people I work with can  
configure this account and drop mail in the right folders.


I have a PHP script that teaches SA by feeding it the Ham and Spam dir  
contents. It then archives the mail, if told to do so. It might have  
been a basic shell script too, since it is just calls to sa-learn. The  
results are automaticly shared between scan hosts.


We do not use per user learning (yet), but all that would be needed is  
some iteration that wraps and repeats the above for all users, using  
the -u user option.


If you make sure the teaching is done per user, it is sufficiently  
'thought-through' imho. If the input of messages is not correct, only  
that users' database will be soiled.


Samy




Thanks in advance,

Peter








-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkksMCIACgkQKIdvzp2UK/F45wCeK/5xQ2fmZf77DbwX4wMDrsYR
6bAAn2yQI7h8HC1biJPuZeRCYKufIoAP
=cTQw
-END PGP SIGNATURE-


Re: Clustering spamassassin + autolearning

2008-11-25 Thread Kai Schaetzl
Peter Fastré wrote on Tue, 25 Nov 2008 16:04:19 +0100:

 2. On my mailbox server I'd like to have a script which goes into the
 mailfolders, searches for a folder with the name 'Spam', feeds the message
 to sa-learn (which should be feeding it to the same bayes database of
 course), and then delete the message. Do you think this is a well-thought
 approach of having my users train the spam filters this way?

Generally yes, but since you are already using MailScanner+Mailwatch: That's 
already built-in and users can just train any messages from MailWatch. Why 
duplicate that?

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com