Re: sa-learn journal location for teaching spamassassin on multiple hosts
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hey Jake, Thx for your reply. I got this same tip off-list (from Jonas Eckerman). I liked the idea and I have already done some successful testing of centralized bayes-data storage in a MySQL database. We are using an SQL back-end for storing 'all things e-mail' anywayz, so this was easily fitted in. I will be roling stuff out as soon as it is ready for production. Alse, the READMEs in the distribution were very useful for setting this up. I did not need any other resources and there were zero issues. Thx to Jonas, Jake and the list for helping out, gj ;) Regards, Samy I'm keeping these full messages in here, as they may present a (kinda) full problem and solution for others having similar issues. On Nov 11, 2008, at 11:51 PM, Jake Maul wrote: On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V. [EMAIL PROTECTED] wrote: I have recently setup a mailbox and a sa-learn script to start teaching SpamAssassin. This was all no problem, but: We have an MX group of usually about 3 MTAs, which all run their own content filter (amavis) and thus use their own SpamAssassin's database. When we are gonna start teaching SpamAssassin with sa-learn, I need to somehow sync the results in the journal to all these hosts. I've checked out the --no-sync and --sync options and I think these options will give me exactly the tools I need for this job. I need to know the location of the journal though and I need to know if there are any pitfalls when syncing a SpamAssassin with a journal from another one on another server. Has anyone got experience with syncing sa-learn between multiple MTAs? How did you solve this? Can SA sync with a journal in an arbitrary location, or does it look for it in one preconfigged place? I hope u have some interresting thought about this issue. Ultimately, you're not syncing 'sa-learn', you're syncing the bayes' DB that sa-learn (and spamd) records to. There's a few ways to go about sharing the bayesian database. Probably the best bet would be to store the bayes DB in MySQL, and point SA on all 3 servers to it- ideally with the database on a 4th server (hey, you can put the AWL info into MySQL as well... may as well hit that up at the same time). You could probably go the --sync and --no-sync route if you fiddled with it enough (never tried it), but honestly a single MySQL DB for bayes would probably be a lot simpler if you have any experience at all with MySQL. It's been good for performance for us even when used on a single server, and it's pretty bulletproof for us- been in use for years. The only tip you really need here is to run OPTIMIZE TABLE every now and then. An alternative hacky solution: turn off autolearn on 2 of the 3, and do sa-learns and autolearning on the 3rd. Then nightly rsync all the bayes DB files over to the other 2 servers and restart spamd. Not pretty, but it should work. Jake -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.8 (Darwin) iEYEARECAAYFAkkhQpcACgkQKIdvzp2UK/Fj+gCeIdwltuT96Zv3vYDplXR0Dh+7 9ykAoIlkJkEF1AZqH6ABbcWGFVXemBhA =gbAW -END PGP SIGNATURE-
Re: sa-learn journal location for teaching spamassassin on multiple hosts
On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V. [EMAIL PROTECTED] wrote: I have recently setup a mailbox and a sa-learn script to start teaching SpamAssassin. This was all no problem, but: We have an MX group of usually about 3 MTAs, which all run their own content filter (amavis) and thus use their own SpamAssassin's database. When we are gonna start teaching SpamAssassin with sa-learn, I need to somehow sync the results in the journal to all these hosts. I've checked out the --no-sync and --sync options and I think these options will give me exactly the tools I need for this job. I need to know the location of the journal though and I need to know if there are any pitfalls when syncing a SpamAssassin with a journal from another one on another server. Has anyone got experience with syncing sa-learn between multiple MTAs? How did you solve this? Can SA sync with a journal in an arbitrary location, or does it look for it in one preconfigged place? I hope u have some interresting thought about this issue. Ultimately, you're not syncing 'sa-learn', you're syncing the bayes' DB that sa-learn (and spamd) records to. There's a few ways to go about sharing the bayesian database. Probably the best bet would be to store the bayes DB in MySQL, and point SA on all 3 servers to it- ideally with the database on a 4th server (hey, you can put the AWL info into MySQL as well... may as well hit that up at the same time). You could probably go the --sync and --no-sync route if you fiddled with it enough (never tried it), but honestly a single MySQL DB for bayes would probably be a lot simpler if you have any experience at all with MySQL. It's been good for performance for us even when used on a single server, and it's pretty bulletproof for us- been in use for years. The only tip you really need here is to run OPTIMIZE TABLE every now and then. An alternative hacky solution: turn off autolearn on 2 of the 3, and do sa-learns and autolearning on the 3rd. Then nightly rsync all the bayes DB files over to the other 2 servers and restart spamd. Not pretty, but it should work. Jake
sa-learn journal location for teaching spamassassin on multiple hosts
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear members, I have recently setup a mailbox and a sa-learn script to start teaching SpamAssassin. This was all no problem, but: We have an MX group of usually about 3 MTAs, which all run their own content filter (amavis) and thus use their own SpamAssassin's database. When we are gonna start teaching SpamAssassin with sa-learn, I need to somehow sync the results in the journal to all these hosts. I've checked out the --no-sync and --sync options and I think these options will give me exactly the tools I need for this job. I need to know the location of the journal though and I need to know if there are any pitfalls when syncing a SpamAssassin with a journal from another one on another server. Has anyone got experience with syncing sa-learn between multiple MTAs? How did you solve this? Can SA sync with a journal in an arbitrary location, or does it look for it in one preconfigged place? I hope u have some interresting thought about this issue. Thx much and regards, Samy Ascha Xel Media Internet Services -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.8 (Darwin) iEYEARECAAYFAkkUKlQACgkQKIdvzp2UK/HoLgCgoLnB4PeP5Vg159g+f5YfSnCo LacAn22WXVRd8y/SSqPMKeNGi9qwEjaS =3sbv -END PGP SIGNATURE-
Re: sa-learn journal location for teaching spamassassin on multiple hosts
On 07.11.08 12:45, Samy Ascha, Xel Media B.V. wrote: I have recently setup a mailbox and a sa-learn script to start teaching SpamAssassin. This was all no problem, but: We have an MX group of usually about 3 MTAs, which all run their own content filter (amavis) and thus use their own SpamAssassin's database. When we are gonna start teaching SpamAssassin with sa-learn, I need to somehow sync the results in the journal to all these hosts. We have group of four MTA servers. However they don't run SA on MTA level (yet). We have users' mailboxes on shared storage cluster, so their bayes DB is on shared space. I'd solve your case by configuring MTA's w/o BAYES, or maybe by using users' configs, if possible - if the mail is sent to one user, should not be a problem. For mail sent to more users, somehow generic configuration and filtering will be used, so users may be willing to have the mail rechecked for spamminess. Has anyone got experience with syncing sa-learn between multiple MTAs? How did you solve this? Can SA sync with a journal in an arbitrary location, or does it look for it in one preconfigged place? I am not sure if it's safe to use journal or bayes DB nfs-mounted... -- Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Save the whales. Collect the whole set.