Re: sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-17 Thread Samy Ascha, Xel Media B.V.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hey Jake,

Thx for your reply. I got this same tip off-list (from Jonas  
Eckerman). I liked
the idea and I have already done some successful testing of  
centralized bayes-data

storage in a MySQL database.

We are using an SQL back-end for storing 'all things e-mail' anywayz,  
so this

was easily fitted in.

I will be roling stuff out as soon as it is ready for production.

Alse, the READMEs in the distribution were very useful for setting  
this up. I

did not need any other resources and there were zero issues.

Thx to Jonas, Jake and the list for helping out, gj ;)

Regards,
Samy

I'm keeping these full messages in here, as they may present a (kinda)  
full problem

and solution for others having similar issues.


On Nov 11, 2008, at 11:51 PM, Jake Maul wrote:

On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V. [EMAIL PROTECTED] 
 wrote:
I have recently setup a mailbox and a sa-learn script to start  
teaching

SpamAssassin. This was all no problem, but:

We have an MX group of usually about 3 MTAs, which all run their  
own content
filter (amavis) and thus use their own SpamAssassin's database.  
When we are
gonna start teaching SpamAssassin with sa-learn, I need to somehow  
sync the

results in the journal to all these hosts.

I've checked out the --no-sync and --sync options and I think these  
options

will give me exactly the tools I need for this job.

I need to know the location of the journal though and I need to  
know if
there are any pitfalls when syncing a SpamAssassin with a journal  
from

another one on another server.

Has anyone got experience with syncing sa-learn between multiple  
MTAs? How
did you solve this? Can SA sync with a journal in an arbitrary  
location, or

does it look for it in one preconfigged place?

I hope u have some interresting thought about this issue.


Ultimately, you're not syncing 'sa-learn', you're syncing the bayes'
DB that sa-learn (and spamd) records to. There's a few ways to go
about sharing the bayesian database. Probably the best bet would be to
store the bayes DB in MySQL, and point SA on all 3 servers to it-
ideally with the database on a 4th server (hey, you can put the AWL
info into MySQL as well... may as well hit that up at the same time).

You could probably go the --sync and --no-sync route if you fiddled
with it enough (never tried it), but honestly a single MySQL DB for
bayes would probably be a lot simpler if you have any experience at
all with MySQL. It's been good for performance for us even when used
on a single server, and it's pretty bulletproof for us- been in use
for years. The only tip you really need here is to run OPTIMIZE TABLE
every now and then.

An alternative hacky solution: turn off autolearn on 2 of the 3, and
do sa-learns and autolearning on the 3rd. Then nightly rsync all the
bayes DB files over to the other 2 servers and restart spamd. Not
pretty, but it should work.

Jake


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkkhQpcACgkQKIdvzp2UK/Fj+gCeIdwltuT96Zv3vYDplXR0Dh+7
9ykAoIlkJkEF1AZqH6ABbcWGFVXemBhA
=gbAW
-END PGP SIGNATURE-


Re: sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-11 Thread Jake Maul
On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V. [EMAIL PROTECTED] 
wrote:
 I have recently setup a mailbox and a sa-learn script to start teaching
 SpamAssassin. This was all no problem, but:

 We have an MX group of usually about 3 MTAs, which all run their own content
 filter (amavis) and thus use their own SpamAssassin's database. When we are
 gonna start teaching SpamAssassin with sa-learn, I need to somehow sync the
 results in the journal to all these hosts.

 I've checked out the --no-sync and --sync options and I think these options
 will give me exactly the tools I need for this job.

 I need to know the location of the journal though and I need to know if
 there are any pitfalls when syncing a SpamAssassin with a journal from
 another one on another server.

 Has anyone got experience with syncing sa-learn between multiple MTAs? How
 did you solve this? Can SA sync with a journal in an arbitrary location, or
 does it look for it in one preconfigged place?

 I hope u have some interresting thought about this issue.

Ultimately, you're not syncing 'sa-learn', you're syncing the bayes'
DB that sa-learn (and spamd) records to. There's a few ways to go
about sharing the bayesian database. Probably the best bet would be to
store the bayes DB in MySQL, and point SA on all 3 servers to it-
ideally with the database on a 4th server (hey, you can put the AWL
info into MySQL as well... may as well hit that up at the same time).

You could probably go the --sync and --no-sync route if you fiddled
with it enough (never tried it), but honestly a single MySQL DB for
bayes would probably be a lot simpler if you have any experience at
all with MySQL. It's been good for performance for us even when used
on a single server, and it's pretty bulletproof for us- been in use
for years. The only tip you really need here is to run OPTIMIZE TABLE
every now and then.

An alternative hacky solution: turn off autolearn on 2 of the 3, and
do sa-learns and autolearning on the 3rd. Then nightly rsync all the
bayes DB files over to the other 2 servers and restart spamd. Not
pretty, but it should work.

Jake


sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-07 Thread Samy Ascha, Xel Media B.V.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear members,

I have recently setup a mailbox and a sa-learn script to start  
teaching SpamAssassin. This was all no problem, but:


We have an MX group of usually about 3 MTAs, which all run their own  
content filter (amavis) and thus use their own SpamAssassin's  
database. When we are gonna start teaching SpamAssassin with sa-learn,  
I need to somehow sync the results in the journal to all these hosts.


I've checked out the --no-sync and --sync options and I think these  
options will give me exactly the tools I need for this job.


I need to know the location of the journal though and I need to know  
if there are any pitfalls when syncing a SpamAssassin with a journal  
from another one on another server.


Has anyone got experience with syncing sa-learn between multiple MTAs?  
How did you solve this? Can SA sync with a journal in an arbitrary  
location, or does it look for it in one preconfigged place?


I hope u have some interresting thought about this issue.

Thx much and regards,
Samy Ascha

Xel Media Internet Services

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkkUKlQACgkQKIdvzp2UK/HoLgCgoLnB4PeP5Vg159g+f5YfSnCo
LacAn22WXVRd8y/SSqPMKeNGi9qwEjaS
=3sbv
-END PGP SIGNATURE-


Re: sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-07 Thread Matus UHLAR - fantomas
On 07.11.08 12:45, Samy Ascha, Xel Media B.V. wrote:
 I have recently setup a mailbox and a sa-learn script to start  
 teaching SpamAssassin. This was all no problem, but:
 
 We have an MX group of usually about 3 MTAs, which all run their own  
 content filter (amavis) and thus use their own SpamAssassin's  
 database. When we are gonna start teaching SpamAssassin with sa-learn,  
 I need to somehow sync the results in the journal to all these hosts.

We have group of four MTA servers. However they don't run SA on MTA level 
(yet). We have users' mailboxes on shared storage cluster, so their bayes DB
is on shared space.

I'd solve your case by configuring MTA's w/o BAYES, or maybe by using users'
configs, if possible - if the mail is sent to one user, should not be a
problem. For mail sent to more users, somehow generic configuration and
filtering will be used, so users may be willing to have the mail rechecked
for spamminess.

 Has anyone got experience with syncing sa-learn between multiple MTAs?  
 How did you solve this? Can SA sync with a journal in an arbitrary  
 location, or does it look for it in one preconfigged place?

I am not sure if it's safe to use journal or bayes DB nfs-mounted...

-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.