RE: Re[2]: Is Bayes Really Necessary?

2005-05-27 Thread Chris Santerre


>-Original Message-
>From: Robert Menschel [mailto:[EMAIL PROTECTED]
>Sent: Thursday, May 26, 2005 8:38 PM
>To: List Mail User
>Cc: users@spamassassin.apache.org
>Subject: Re[2]: Is Bayes Really Necessary?
>
>
>Hello List,
>
>Thursday, May 26, 2005, 10:05:26 AM, you wrote:
>
>LMU>   Though nobody seems to have said it exactly this way:  It seems
>LMU> to be becoming very obvious that the people who say the 
>have problems
>LMU> with Bayes are those who support a diverse group of users 
>(e.g. ISPs
>LMU> and email providers) and those who find it works well, 
>even with autolearning
>LMU> are those with either small numbers of users or users who 
>are mostly of
>LMU> a very specific categorization type (e.g. medical, legal, 
>technical, or
>LMU> just about any homogenous group).
>
>Sorry -- major email server here, serving several hundred domains,
>well over 1k users, all types from techical experts to business people
>to newspaper reporters to retailers to pharmacists to people with
>professions of various ages. Site-wide Bayes. Everyone has access to
>sa-learn via IMAP. Works marvelously.
>
>Bob Menschel

Yeah but you aren't the typical user/admin :) You have more then a clue on
how to care and feed a bayse DB.

--Chris


Re: Re[2]: Is Bayes Really Necessary?

2005-05-26 Thread List Mail User
>...
>
>Hello List,
>
>Thursday, May 26, 2005, 10:05:26 AM, you wrote:
>
>LMU>   Though nobody seems to have said it exactly this way:  It seems
>LMU> to be becoming very obvious that the people who say the have problems
>LMU> with Bayes are those who support a diverse group of users (e.g. ISPs
>LMU> and email providers) and those who find it works well, even with 
>autolearning
>LMU> are those with either small numbers of users or users who are mostly of
>LMU> a very specific categorization type (e.g. medical, legal, technical, or
>LMU> just about any homogenous group).
>
>Sorry -- major email server here, serving several hundred domains,
>well over 1k users, all types from techical experts to business people
>to newspaper reporters to retailers to pharmacists to people with
>professions of various ages. Site-wide Bayes. Everyone has access to
>sa-learn via IMAP. Works marvelously.
>
>Bob Menschel
>
Bob,

I have actually many times specifically noted that you have said it
works for you.  I did not mean to imply that it doesn't always work in a
heterogenous environment, just that all the people who say it doesn't work
seem to fit that category (i.e. for some subset of people like yourself,
there may be problems of some sort).  Other people at large sites have also
reported very good results and some of them also seem to be ISPs or email
providers.  For the other group, homogenous environments, there seems to
be uniform agreement that it does work (now someone will speak up and point
out a counter-example).

I have notice a few time when you've posted scores, that you have
a "BAYES_80" where I take the posted message, run "-D -t" and get a "BAYES_99",
which might mean it does still work, and quite well - but not as `extremely'
well as in other environments (80%+ of all email that hits SA on my servers
ends up as either BAYES_00 or as BAYES_99 -- the rare exception I usually
look at (they are mostly coming to my own accounts or are tagged as spam
by other rules anyway), and they are either personal contacts, stock pumps
or 419s -- mostly email from my "marketing" family members, whose writing
style seems to be quite similar to some spam;  I sure that I will eventually
refuse some mail from my father, he often hits BAYES_80 and he mails from
a MSN account - if it weren't for AWL, it already would have happened:-).

A quick check of the last couple of days shows 72.96% at BAYES_00
and 10% at BAYES_99 and 11.29% at BAYES_50.  I suspect the results are less
extreme for you, but maybe not (that would be good to hear).  Note: I have
a lot of MTA level rejection, pre-filtering before SA that takes out most
of the remaining spam and almost all mailing lists are set to use the
"bayes_ignore_to" directive - so my results posted above are highly skewed
by all these factors (e.g. > 40% of valid email does not run through bayes,
and things like nightly server reports generated internally do - I don't
even trust my own firewall machines' reports).

Finally, you seem to have done a good job of `training' your users
to use sa-learn, which is probably itself more valuable than any tweaking
a sysadmin could do alone.  I'd also bet dollars to donuts, that your have
more modifications to a "stock" install than I do (e.g. SARE rules, etc.)
and probably far more than most people with BAYES problems.

Paul Shupak
[EMAIL PROTECTED]

P.S. I know the account says "List Mail User", but why is this the only
mailing list that almost uniformly references me that way?  Though, I do
get called by the sobriquet "Administrative User" when I use accounts
which are labeled like that.  Maybe, it just this list's user base is
ingrained in using the header label instead of the signature!?  Anyway,
I kind of like the "LMU" :)