Re: Moving bayes from bdb to MySQL

2006-04-05 Thread Lars Ringh

Michael Monnerie wrote:

On Montag, 3. April 2006 14:34 Lars Ringh wrote:


Now, since in each case the source data can come from two different
servers scanning the same kind of mails, should I try to merge the
bayes-data from servers home1 and home2 into the the same myqsl-db
and then merge the data from corp1 and corp2 into the other mysql-db,
or should I pick my starting sourcedata from only one server in each
pair? Would spamassassin benefit from having the greater source to
look at, or would I only be adding close-to-identical data which
would then only be expired faster than it was to merge them?



I believe you should *not* mix two different bayes DBs. Use just one, 
and the rest will fill up with the next SPAM jumping in...



Yes, I've done some more thinking myself and this must be the only 
reasonable approach.




165MB...335MB



Did you not bayes_auto_expire?



I was under the impression that i did, but since I've done some import 
of the data into mysql-dbs (where I am able to examine the data easier 
than when they are in bdb-files) I must say that I don't seem to...


A bit strange though, since the files reach this size from scratch in 
quite a short time, and then the file sizes stays at this size, that is 
they don't grow bigger than this. That's why I thought auto expire did 
it's work... One might suspect that I've given bayes_expiry_max_db_size 
some really odd value but that's not the case either...


Well, anyway, thanks for your input.

//maccall

--

lars-dot-ringh-at-bahnhof-dot-net


Re: Moving bayes from bdb to MySQL

2006-04-04 Thread Michael Monnerie
On Montag, 3. April 2006 14:34 Lars Ringh wrote:
> Now, since in each case the source data can come from two different
> servers scanning the same kind of mails, should I try to merge the
> bayes-data from servers home1 and home2 into the the same myqsl-db
> and then merge the data from corp1 and corp2 into the other mysql-db,
> or should I pick my starting sourcedata from only one server in each
> pair? Would spamassassin benefit from having the greater source to
> look at, or would I only be adding close-to-identical data which
> would then only be expired faster than it was to merge them?

I believe you should *not* mix two different bayes DBs. Use just one, 
and the rest will fill up with the next SPAM jumping in...

> 165MB...335MB

Did you not bayes_auto_expire?

mf gzmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at   Tel: 0660/4156531  Linux 2.6.11
// PGP Key:   "lynx -source http://zmi.at/zmi2.asc | gpg --import"
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net Key-ID: 0x70545879


pgp31Y4BHbf1y.pgp
Description: PGP signature


Moving bayes from bdb to MySQL

2006-04-03 Thread Lars Ringh


I'm about to move my bayes and auto-whitelist data from local db-files 
on each server to a common MySQL-db.


I have 2+2 load balanced servers scanning mail using amavisd-new for 
different kinds of customers, home and corporate users repectively, and 
I was planning to keep their respective data  in two separate db's since 
they seem to be quite different.


Now, since in each case the source data can come from two different 
servers scanning the same kind of mails, should I try to merge the 
bayes-data from servers home1 and home2 into the the same myqsl-db and 
then merge the data from corp1 and corp2 into the other mysql-db, or 
should I pick my starting sourcedata from only one server in each pair? 
Would spamassassin benefit from having the greater source to look at, or 
would I only be adding close-to-identical data which would then only be 
expired faster than it was to merge them?


And out of curiosity, the "home servers" have about 165MB och bayes-data 
and 335MB in auto-whitelist, while the "corporate servers" have it the 
other way around, 335MB in bayes-db and 165MB in auto-whitelist. Could 
anyone enlight me briefly on why? Is it as simple as that the 
"home-servers" has fewer senders/recipients, but more different emails, 
and the "corporate-servers" has more senders/recipients but fewer 
different e-mails, or what?


//maccall

--

lars-dot-ringh-at-bahnhof-dot-net