Re: Large-scale global Bayes tuning?

2008-04-09 Thread John Hardin

On Wed, 9 Apr 2008, Kris Deugau wrote:


autolearn is picking up ~1.5M+ from ~300K messages on a daily basis.


Push your autolearn thresholds out to reduce the overall volume of learned 
spam and ham?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  People seem to have this obsession with objects and tools as being
  dangerous in and of themselves, as though a weapon will act of its
  own accord to cause harm. A weapon is just a force multiplier. It's
  *humans* that are (or are not) dangerous.
---
 4 days until Thomas Jefferson's 265th Birthday


Re: Large-scale global Bayes tuning?

2008-04-09 Thread Michael Scheidell


 From: Kris Deugau [EMAIL PROTECTED]
 Organization: ViaNet Internet Solutions
 Date: Wed, 09 Apr 2008 12:12:43 -0400
 To: users@spamassassin.apache.org
 Subject: Large-scale global Bayes tuning?
 
 Anyone have any suggestions on tuning a large global Bayes db for
 stability and sanity?  I've got my fingers in the pie of a moderately
 large mail cluster, but I haven't yet found a Bayes configuration that's
 sane and stable for any extended period.  Wiping it completely about
 once a week seems to provide acceptable filtering performance (we have
 a number of addon rulesets), but I still see spam in my inbox with
 BAYES_00 - a sure sign of a mistuned Bayes database.
 
Bayes on cluster begs the question: what if you didn't replicate the bayes
tables, and left them server specific?

Since (depending on configurations) some of the servers might get 'spam
only' (higher mx records), maybe just take one of the 'valid' bayes tables
and manually copy it (sa-learn backup, sa-learn clear, restore) every week
or so.

Only way I could get a cluster of 9 to work right.
-- 
Michael Scheidell, CTO
|SECNAP Network Security
Winner 2008 Network Products Guide Hot Companies
FreeBSD SpamAssassin Ports maintainer
Charter member, ICSA labs anti-spam consortium

_
This email has been scanned and certified safe by SpammerTrap(tm). 
For Information please see http://www.spammertrap.com
_


Re: Large-scale global Bayes tuning?

2008-04-09 Thread Kris Deugau

Michael Scheidell wrote:

Bayes on cluster begs the question: what if you didn't replicate the bayes
tables, and left them server specific?


It may yet take that.  :(  (If only for overall cluster reliability - 
any one of the current three machines could handle the current load 
without any trouble, but we're likely going to stuff ClamAV on them as 
well.)  Unfortunately that means doing mistake-training on *each* 
machine - autolearn on it's own just doesn't cut it.


I'm dogfooding pretty much that exact scenario on one machine;  it's got 
its own local Bayes DB that I'm hand-training with my own mail.



Since (depending on configurations) some of the servers might get 'spam
only' (higher mx records), maybe just take one of the 'valid' bayes tables
and manually copy it (sa-learn backup, sa-learn clear, restore) every week
or so.


Mmmh.  Access is for both inbound and outbound mail, through a 
load-balancer;  the type of mail seen on any one system is pretty much 
identical over time.


Re: Large-scale global Bayes tuning?

2008-04-09 Thread Kris Deugau

John Hardin wrote:

On Wed, 9 Apr 2008, Kris Deugau wrote:


autolearn is picking up ~1.5M+ from ~300K messages on a daily basis.


Push your autolearn thresholds out to reduce the overall volume of 
learned spam and ham?


I've thought about that.  It makes it more difficult to get Bayes data 
on the critical messages in that middle range though.  :(


-kgd


Large-scale global Bayes tuning?

2008-04-09 Thread Kris Deugau
Anyone have any suggestions on tuning a large global Bayes db for 
stability and sanity?  I've got my fingers in the pie of a moderately 
large mail cluster, but I haven't yet found a Bayes configuration that's 
sane and stable for any extended period.  Wiping it completely about 
once a week seems to provide acceptable filtering performance (we have 
a number of addon rulesets), but I still see spam in my inbox with 
BAYES_00 - a sure sign of a mistuned Bayes database.


Past experience with (much) smaller systems has shown stable behaviour 
with bayes_expiry_max_db_size set to 150 (~40M BDB Bayes), daily 
expiry runs delete ~25-35K tokens;  mail volume ~3K/day.  However, the 
larger system (MySQL, currently set with max_db_size at 300, on-disk 
files running ~100M) only seems to be expiring that same 25-35K tokens 
even though autolearn is picking up ~1.5M+ from ~300K messages on a 
daily basis.  Reading through the docs on token expiry I would guess it 
should be far more aggressive than it is.  (Among other things, I really 
don't want to bump up max_db_size by two orders of magnitude;  up to ~5M 
should be fine, and I could see as high as 7.5M if really necssary.)


I'm not even really sure what questions to ask to get more detail; 
sa-learn -D doesn't really spit out *enough* detail about the expiry 
process to know for sure if something is going wrong there.


-kgd


Re: Large-scale global Bayes tuning?

2008-04-09 Thread John Hardin

On Wed, 9 Apr 2008, Kris Deugau wrote:


John Hardin wrote:

 On Wed, 9 Apr 2008, Kris Deugau wrote:

  autolearn is picking up ~1.5M+ from ~300K messages on a daily basis.

 Push your autolearn thresholds out to reduce the overall volume of learned
 spam and ham?


I've thought about that.  It makes it more difficult to get Bayes data 
on the critical messages in that middle range though.  :(


How varied is the character of your message traffic? Is manual learning an 
option, especially with larger autolearn thresholds?


Then at least you'd be able to reseed your bayes with a known-good corpus.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  People seem to have this obsession with objects and tools as being
  dangerous in and of themselves, as though a weapon will act of its
  own accord to cause harm. A weapon is just a force multiplier. It's
  *humans* that are (or are not) dangerous.
---
 4 days until Thomas Jefferson's 265th Birthday


Re: Large-scale global Bayes tuning?

2008-04-09 Thread Michael Scheidell
 From: Kris Deugau [EMAIL PROTECTED]
 Organization: ViaNet Internet Solutions
 Reply-To: users@spamassassin.apache.org
 Date: Wed, 09 Apr 2008 12:36:56 -0400
 To: users@spamassassin.apache.org
 Subject: Re: Large-scale global Bayes tuning?
 
 Michael Scheidell wrote:
 Bayes on cluster begs the question: what if you didn't replicate the bayes
 tables, and left them server specific?
 
 It may yet take that.  :(  (If only for overall cluster reliability -
 any one of the current three machines could handle the current load
 without any trouble, but we're likely going to stuff ClamAV on them as
 well.)  Unfortunately that means doing mistake-training on *each*
 machine - autolearn on it's own just doesn't cut it.
 
 I'm dogfooding pretty much that exact scenario on one machine;  it's got
 its own local Bayes DB that I'm hand-training with my own mail.
 
You could also take mysql off of one or several, have them load balance to
the other mysql servers, run a caching (global) dns server and clamav on one
of them.

What about DCC? I assume with those volumes you are running a local DCC
server, and having the other boxes talk to it?


 Since (depending on configurations) some of the servers might get 'spam
 only' (higher mx records), maybe just take one of the 'valid' bayes tables
 and manually copy it (sa-learn backup, sa-learn clear, restore) every week
 or so.
 
 Mmmh.  Access is for both inbound and outbound mail, through a
Keep a couple for outbound only, won't need bayes too much on those.
We have an engineering spec for a 9x9 (9 nodes in a cluster, 9 clusters in a
group) to support up to 2MM users, and we do a lot of task and load
splitting like that.

-- 
Michael Scheidell, CTO
|SECNAP Network Security
Winner 2008 Network Products Guide Hot Companies
FreeBSD SpamAssassin Ports maintainer
Charter member, ICSA labs anti-spam consortium

_
This email has been scanned and certified safe by SpammerTrap(tm). 
For Information please see http://www.spammertrap.com
_


Re: Large-scale global Bayes tuning?

2008-04-09 Thread Kris Deugau

John Hardin wrote:
How varied is the character of your message traffic? Is manual learning 
an option, especially with larger autolearn thresholds?


What is this... manual learning...  you speak of?  g

Not really an option in the short term, although in the long term I'd 
*like* to have a system similar to what I've mostly trained users to do 
on the much smaller systems - forward misclassified mail to a suitable 
role account as an attachment for manual processing (whitelist, 
blacklist, feed to Bayes, write/adjust rules, etc).  Of course, that 
requires someone to *do* the manual processing  :(


I've been taking my own FNs and feeding them back in;  that's really the 
only misclassified mail I have easy access to.  No FPs noticed so far



Then at least you'd be able to reseed your bayes with a known-good corpus.


*nod*  I've thought about exporting the database from the smaller system 
and pulling it in to the cluster to see how the accuracy is.


Tokens don't get expired according to my understanding of the expiry 
algorithm about sums up the immediate problem;  overall filter accuracy 
is pretty good on the whole.


-kgd


Re: Large-scale global Bayes tuning?

2008-04-09 Thread SM

Hi Kris,
At 09:12 09-04-2008, Kris Deugau wrote:
Anyone have any suggestions on tuning a large global Bayes db for 
stability and sanity?  I've got my fingers in the pie of a 
moderately large mail cluster, but I haven't yet found a Bayes 
configuration that's sane and stable for any extended 
period.  Wiping it completely about once a week seems to provide 
acceptable filtering performance (we have a number of addon 
rulesets), but I still see spam in my inbox with BAYES_00 - a sure 
sign of a mistuned Bayes database.


Spam hitting BAYES_00 points to the bayes database being 
polluted.  That can happen if the autolearn levels are not low 
enough.  Some manual learning can help to keep the Bayes database in 
tune.  A more aggressive expiry won't necessarily prevent 
mistuning.  You'll have to do some MySQL tuning for performance.  In 
a large setup, manual learning isn't always possible.  You can have 
some rules to identify some good and bad messages which are 
representative of the userbase.


Regards,
-sm




Global Bayes

2008-03-24 Thread Mike Fahey

Just upgraded to 3.2.4.

I am running spamassasin as a normal user, not root.

I keep seeing this in the log files.

bayes: cannot open bayes databases /var/sabayes/.spamassassin/bayes_* 
R/W: lock failed: File exists


There are about 20 lock files in the directory.

Is spamassassin not cleaning up the lock files properly or is it even 
working?


Looks like there have been changes here since version 3.2.0




Re: Global Bayes

2008-03-24 Thread Robert Blayzor


On Mar 24, 2008, at 11:08 AM, Mike Fahey wrote:

Just upgraded to 3.2.4.

I am running spamassasin as a normal user, not root.

I keep seeing this in the log files.

bayes: cannot open bayes databases /var/sabayes/.spamassassin/ 
bayes_* R/W: lock failed: File exists


There are about 20 lock files in the directory.

Is spamassassin not cleaning up the lock files properly or is it  
even working?


Looks like there have been changes here since version 3.2.0




I don't know of any specific changes between versions, but... whenever  
I noticed this happen it was almost always due to disk space,  
permissions or the fact you have autoexpire turned on.  Double check  
the permissions on your folders and make sure the user you run  
SpamAssassin under has the right privs required.  Stop SA, clean up  
the files, and try restarting.


A good idea (if you're running global bayes) is to turn off auto- 
expire and run a sa-learn force expire at a normal interval.  We've  
been running this way for years and it seems to perform just fine  
under 3.2.4.


--
Robert Blayzor
INOC
[EMAIL PROTECTED]
http://www.inoc.net/~rblayzor/

Mac OS X. Because making Unix user-friendly is easier than debugging  
Windows.







Re: Global Bayes

2008-03-24 Thread Jared Hall

Mike Fahey wrote:

Just upgraded to 3.2.4.

I am running spamassasin as a normal user, not root.

I keep seeing this in the log files.

bayes: cannot open bayes databases /var/sabayes/.spamassassin/bayes_* 
R/W: lock failed: File exists


There are about 20 lock files in the directory.

Is spamassassin not cleaning up the lock files properly or is it even 
working?


Looks like there have been changes here since version 3.2.0



In your global config (local.cf), try:

lock_method flock

and delete all those pesky lock files for good.

Might see a speed improvement also.  Definitely works much
better under heavy loads.

JCH



Global Bayes and AWL

2007-10-13 Thread Magnus Anderson

Hi,

I have read this thread,
http://www.nabble.com/forum/ViewPost.jtp?post=819176framed=y

This is also what I am searching for to do. Make SpamAssassin score against
both a AWL/Bayes by the user and a AWL/Bayes by the system.

What I was thinking on was to make a new set of rules for SA that checks
agains the AWL and Bayes again, but this time as a specific user, like
default. 

I copied the /usr/share/spamassassin/60_awl.cf and 23_bayes.cf to
/etc/mail/spamassassin and renamed all BAYES_* and AWL to GLOBAL_BAYES_* and
GLOBAL_AWL.

Then I added user_awl_sql_override_username and
user_bayes_sql_override_username to the new rules.

This however made CGPSA, that I use against CommuniGate Pro, to run AWL
saves against the MySQL table as default to.

It also wrote output like Merging duplicate GLOBAL_AWL and AWL.

Is this not possible at all, has someone made this work?
-- 
View this message in context: 
http://www.nabble.com/Global-Bayes-and-AWL-tf4618683.html#a13190805
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



RE: Global Bayes and AWL

2007-10-13 Thread Giampaolo Tomassoni
 -Original Message-
 From: Magnus Anderson [mailto:[EMAIL PROTECTED]
 Sent: Saturday, October 13, 2007 5:40 PM
 
 
 Hi,
 
 I have read this thread,
 http://www.nabble.com/forum/ViewPost.jtp?post=819176framed=y
 
 This is also what I am searching for to do. Make SpamAssassin score
 against
 both a AWL/Bayes by the user and a AWL/Bayes by the system.
 
 What I was thinking on was to make a new set of rules for SA that
 checks
 agains the AWL and Bayes again, but this time as a specific user, like
 default.
 
 I copied the /usr/share/spamassassin/60_awl.cf and 23_bayes.cf to
 /etc/mail/spamassassin and renamed all BAYES_* and AWL to
 GLOBAL_BAYES_* and
 GLOBAL_AWL.
 
 Then I added user_awl_sql_override_username and
 user_bayes_sql_override_username to the new rules.
 
 This however made CGPSA, that I use against CommuniGate Pro, to run AWL
 saves against the MySQL table as default to.
 
 It also wrote output like Merging duplicate GLOBAL_AWL and AWL.
 
 Is this not possible at all, has someone made this work?

It is not impossible, but it would borrow its own speed cost.

Some time ago I wished have a three-level layered Bayes: site level,
organization level and mailbox level.

The idea was to reshape the bayes DB store code and, probably, scoring code,
such that

a) during mail scanning, a token unknown by the user would get scored
thanks to the organizational or site one (if any);

b) new tokens learned (or auto-learned) by Bayes would contribute to all the
three levels.

From a store standpoint, this means that tokens shouldn't have any ham/spam
count anymore, but instead there should be a table listing tokens belonging
to a given mail and a table listing mails received by each user. In this
latter table, there should be a ham/spam flag.

When an incoming mail is scanned and tokens are extracted, for each token
the code should count how many times the user (auto-) reported that token
as being ham or spam or, if there are no occurrences of that token in the
user layer, how many times that token had been reported as ham or spam at
organizational level (that is: by all users in a domain/organization). Then,
if there is again no occurrence of the token, how many times that token had
been tagged as spammy at site level (that is: by every user in every
organization), if any.

This reasoning could even be changed somehow in order to statistically
prioritize user preferences over organizational ones over site ones, which
would be much preferred the previous idea since simply spreading the mail
corpus in three levels would easily result in a unreliably too small user
and even a organizational virtual corpus. However, this would mean to tune
the well-known Bayes classification equations to this need, which should be
done carefully and not released before a review from some Bayes'
theory-savvy person.

A further benefit steaming from a multi-layer approach would be easy and
reliable expiration of bayes entries, by simply deleting mails arrived
before the expire period, then tokens not anymore referred by any e-mail.
This is something most serious sql server could even do automatically after
deleting any token whose last-seen time is before a given threshold.

Also, actually AWL owns its own table to do its work. This design could
instead use two further fields on the mails table with the source mail
address and ip address in them, and a further field in the usermails table
with the computed SA score in it. AWL could use this data in order to do its
dirty job, thereby obtaining data expiration for free.

Of course, since there were so much impact in the Bayes code, I surely
preferred this design be in the mainstream SA code, in order to avoid to
reinvent the wheel each time I had to update SA.

The problem is that this design would be much more complex than the actual
one and the question is: would it be eadible by everybody but the tiniest
ISPs using SA? It probably would be good to me, with some hundreds e-mails
received per day. But what if one has to scan 10,000,00 mails/day? Sure one
can use smart sql servers with statistical query optimizers and the like,
but this way too computing the bayes score in an incoming mail would
probably take a couple of seconds in the average, as opposed to the current
few tents of second...

So, flexibility often comes at speed expenses and I guess many in this list
would not appreciate.

Giampaolo


Re: switching from global bayes to per-user bayes

2006-10-05 Thread Adam Lanier
No comments whatsoever?


signature.asc
Description: This is a digitally signed message part


R: switching from global bayes to per-user bayes

2006-10-05 Thread Giampaolo Tomassoni
 I am looking into switching from a global bayes/awl/setting environment
 to a per-user environment with MySQL as a backend.
 
 puts on asbestos suit
 Would anyone care to offer an opinion as to whether and/or to what
 degree this might make in overall effectiveness?  Anyone back up that
 opinion with cold hard facts?
 
 Will I be able to migrate small sets of users from global to per-user or
 will I have to make the jump for all my end-users/domains at once?

I have a suggestion to spare for the setting environment which works pretty 
well to me and would avoid the per-user/global question.

My amavis settings are in a postgres db in which each organization (more or 
less = domain) has a schema. Each organization has a table with user-defined 
settings. My public schema (i.e.: the default one) has a table too with 
organizational settings. Also, the public schema has a view which is the mean 
by which I get the amavis settings for a given user. It attempts fetching the 
per-user settings table in the organizational schema and, if they are missing, 
it attempts fetching the per-organization settings. If this too are missing, it 
uses some default which may be tought as local (ie.: server-wide) settings.

I did find this three-layers way of handling settings pretty useful: a user 
wants to get .exe attachments without having them wrapped into a warning 
message? Put a record in the per-user table. An org wants to have treats 
reported to a specific user? Tell it to the per-organization table.

However, I don't have bayes data on the db (it's in the global bdb). This is 
because most of my customers use pop3 to download messages, so I have quite no 
way to train bayes efficently. Do I?

Also, awl settings are in a database only to ease their adjustment (and, in 
future, replication), but are global as well: I'm serving small communities in 
my town, so often a source ip/e-mail scores may be reasonably used for all my 
potential destinators.

Cheers,

giampaolo


 I'd like to preload the bayes db for each user so that's it's 'primed'
 and ready to do.  Obviously, it would be preferable to preload with
 their specific mail but is it possible to feed bayes for each user with
 a generic set of spam/ham?
 



switching from global bayes to per-user bayes

2006-10-04 Thread Adam Lanier
I am looking into switching from a global bayes/awl/setting environment
to a per-user environment with MySQL as a backend.

puts on asbestos suit
Would anyone care to offer an opinion as to whether and/or to what
degree this might make in overall effectiveness?  Anyone back up that
opinion with cold hard facts?

Will I be able to migrate small sets of users from global to per-user or
will I have to make the jump for all my end-users/domains at once?

I'd like to preload the bayes db for each user so that's it's 'primed'
and ready to do.  Obviously, it would be preferable to preload with
their specific mail but is it possible to feed bayes for each user with
a generic set of spam/ham?


signature.asc
Description: This is a digitally signed message part


Bayes and SQL and Vpopmail and /user + global bayes

2006-08-09 Thread Szeki - Inc
Hello,

SA 3.1.4
exec /usr/bin/spamd -v -m 32 -D -q -u vpopmail -s stderr 21

I am using vpopmail installation, and use /user perfs, for /user bayes and
other user conf is stored in SQL.

Problem:

If a mail comes in, and no real vpopmail user is present (smtproutes), than
SA pick's a random real vpopomail user, and works with that bayes db. I
can't configure global fallback, or stuff like that.
If I try to put @GLOBAL bayes_path to SQL than SA says, this is an
administrator config param, and it is not allowed in there.

Any solution ?

AND not so important question:

Can I use both site wide, and /user bayes for one incoming mail ?

Peter



combine user and global Bayes with SQL?

2006-05-19 Thread Mike Jackson
I guess the subject line says it all. I'm running SA 3.1.1 with Bayes stored 
in MySQL. Is it possible to learn messages as a global user and have the 
tokens apply when evaluating individual users' email? (Never mind if it 
would be truly effective; this is more of a theoretical question.) 



Re: per-user or global bayes (was: HUGE bayes DB (non-sitewide) advice?)

2005-11-13 Thread email builder
bump

--- Michael Monnerie [EMAIL PROTECTED] wrote:

  My users are quite happy
  with overall markup of the spam.  We occasionally get a HAM marked as
  SPAM.  We have an odd client base though.
 
 The question is: when to use global and when per-user bayes?
 
 On our server, we have people of different languages, communicating with 
 different countries all over the world, in different areas 
 (advertising, production, IT, etc.). I thought in that case a per-user 
 bayes would be much better, as viagra is something good for the one, 
 but bad for the other.
 
 What's the general recommendation for bayes?




__ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com


Re: per-user or global bayes (was: HUGE bayes DB (non-sitewide) advice?)

2005-11-09 Thread Michael Monnerie
On Mittwoch, 9. November 2005 08:04 Gary W. Smith wrote:
 My users are quite happy
 with overall markup of the spam.  We occasionally get a HAM marked as
 SPAM.  We have an odd client base though.

The question is: when to use global and when per-user bayes?

On our server, we have people of different languages, communicating with 
different countries all over the world, in different areas 
(advertising, production, IT, etc.). I thought in that case a per-user 
bayes would be much better, as viagra is something good for the one, 
but bad for the other.

What's the general recommendation for bayes?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at   Tel: 0660/4156531  Linux 2.6.11
// PGP Key:   lynx -source http://zmi.at/zmi2.asc | gpg --import
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net Key-ID: 0x70545879


pgpCN6ryXTaZ2.pgp
Description: PGP signature


Re: [sa-list] Re: global bayes database?

2005-09-09 Thread Dan Mahoney, System Admin

On Fri, 9 Sep 2005, Michael Parker wrote:


Oh, you want an entirely different Bayes storage module, one that
doesn't exist.  You're more than welcome to create your own, perldoc
Mail::SpamAssassin::BayesStore to get a sense of the API that you must
implement.  I'll leave the issues surrounding combining of bayes
databases as an exercise to the reader (suggest you search the archives
for previous msgs on this topic).


Interesting.  The API looks fairly straightforward.

Support for full-blown-multi-bayes isn't something I could see myself 
implementing right now, based purely on time constraints, but I don't 
think it's particularly hard.


Still, the tweak to add a call that does nspam_nham_get, and if it's less 
than the required number for effective bayes, uses the system bayes DBs 
for scanning (but not learning, if autolearn is deemed appropriate) should 
be easy enough.


I've searched the users list -- my issues with token collision are nil -- 
I'm sure everything I've got is in the new format since any attempts I 
made to try and get my original users stuff in crashed my systems.


I'm going to note this more for anyone else who searches this list than 
for myself -- scoring on multiple bayes counts could have disasterous 
circumstances.  Since it *has* to be read-only...


(since everyone gets the same spam -- see 
http://article.gmane.org/gmane.mail.spam.spamassassin.general/60376 )


..any admin must realize that for all they know, their user could work 
for Pfizer or SmithKline -- and you could be tagging all their legit 
workmail as bad.  Normal bayes prevents this (or forces the user to accept 
that since they don't consider the names of drugs a bad thing, they have 
to deal with the spam).


To pull an old phrase...one mans junk is another's treasure.

Still, this could be as simple as calling the bayes algorithm twice, once 
as $user, once as $system -- and maintaining a different (probably 
slightly lower) set of scores for $system.


Given, maybe the multi-bayes option should even be off by default for 
users with a good corpus (define good...200 messages?  a thousand?).


But I know that since I get more email, use pine and a shell, and 
religiously shuffle all my spam to spamassassin -r, that I'm more likely 
to have a complete corpus than those users which use outlook and have to 
rely on the automatic learning features.


Okay, I've babbled enough.

-Dan

--

Dan Mahoney
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---



Re: global bayes database?

2005-09-08 Thread Matt Kettler

At 05:04 AM 9/8/2005, Dan Mahoney, System Admin wrote:
As my bayes database and my training of spamassassin is much better than 
that of most of my users...


Is there any way to augment user bayes DB with a call to mine?


one way is in your local.cf just set everyone to use the same bayes 
database and make it world accessible:


bayes_path {somepath}/bayes
bayes_file_mode 0777

Notes:
bayes_path needs to end in /bayes, as it's really a path plus half a 
filename. SA will append _toks and _seen to this to create bayes_toks 
and bayes_seen.


bayes_mode needs to be 0777 not 0666. It's sometimes used in directory 
creation and works more like a umask than a mode.




Re: [sa-list] Re: global bayes database?

2005-09-08 Thread Dan Mahoney, System Admin

On Thu, 8 Sep 2005, Matt Kettler wrote:


At 05:04 AM 9/8/2005, Dan Mahoney, System Admin wrote:
As my bayes database and my training of spamassassin is much better than 
that of most of my users...


Is there any way to augment user bayes DB with a call to mine?


one way is in your local.cf just set everyone to use the same bayes database 
and make it world accessible:


I'm using SQL.  I'm sorry, I should have mentioned that.  The SQL docs 
don't say anything about it.


-Dan

--

Why are you wearing TWO grounding straps?

-John Evans, Ezzi Computers August 23, 2001


Dan Mahoney
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---



Re: [sa-list] Re: global bayes database?

2005-09-08 Thread Michael Parker
Dan Mahoney, System Admin wrote:

 On Thu, 8 Sep 2005, Matt Kettler wrote:

 At 05:04 AM 9/8/2005, Dan Mahoney, System Admin wrote:

 As my bayes database and my training of spamassassin is much better
 than that of most of my users...

 Is there any way to augment user bayes DB with a call to mine?


 one way is in your local.cf just set everyone to use the same bayes
 database and make it world accessible:


 I'm using SQL.  I'm sorry, I should have mentioned that.  The SQL docs
 don't say anything about it.


You mean this portion of the SQL docs?

In addition to the global configuration directives there is a user
preference:

bayes_sql_override_usernamesomeusername

This directive, if used, will override the username used for storing
data in the database.  This could be used to group users together to
share bayesian filter data.  You can also use this config option to
trick sa-learn to learn data as a specific user.


Michael



signature.asc
Description: OpenPGP digital signature


Re: global bayes database?

2005-09-08 Thread Michael Parker
Dan Mahoney, System Admin wrote:


 Right, but this isn't exactly what I was looking for.  Basically, I'm
 looking for a system whereby if a users bayes corpus isn't primed
 properly, it can refer to others, as sort of an if -- then system,
 rather than manually overriding it.

 As SQL becomes the recommended standard, I'm hoping this feature
 becomes more popular, as I would *love* to see SA do scoring and
 training based on BOTH system-bayes files as well as user-bayes.

 Maybe I should just ask for a pony :)


Oh, you want an entirely different Bayes storage module, one that
doesn't exist.  You're more than welcome to create your own, perldoc
Mail::SpamAssassin::BayesStore to get a sense of the API that you must
implement.  I'll leave the issues surrounding combining of bayes
databases as an exercise to the reader (suggest you search the archives
for previous msgs on this topic).

Michael


signature.asc
Description: OpenPGP digital signature


spamassasin global bayes database

2005-03-02 Thread Matt
What do I have to do to get spamassassin to use a global bayes
database for all users on the system, rather then per user?


Re: spamassasin global bayes database

2005-03-02 Thread Steven Dickenson
Matt wrote:
What do I have to do to get spamassassin to use a global bayes
database for all users on the system, rather then per user?
http://wiki.apache.org/spamassassin/SiteWideBayesSetup
Steven
--
Steven Dickenson [EMAIL PROTECTED]
http://www.mrchuckles.net


Re: spamassasin global bayes database

2005-03-02 Thread Matias Lopez Bergero
Matt wrote:
What do I have to do to get spamassassin to use a global bayes
database for all users on the system, rather then per user?
read the wiki :)
http://wiki.apache.org/spamassassin/SiteWideBayesSetup