Re: Default SpamAssassin scores don't make sense

2006-11-07 Thread Justin Mason

Matt Kettler writes:
 Adam Katz wrote:
  Theo Van Dinter wrote:

  http://wiki.apache.org/spamassassin/HowScoresAreAssigned
  
 
  Thanks, that's what I was looking for.
 

  The short version is that as far as SA and the perceptron (that which
  generates the scores) are concerned, rules are independent.  There is no
  increase in severity, either a rule hits or it doesn't
  
 
  Bayes is a perfect example of this, and is mentioned as such on the very
  page you referenced.  Several filters, including those that I listed at
  the top of this thread, are indeed incremental, increasing in severity.
   I am shocked to hear that there is nobody moderating the automated
  scores (an Alan Greenspan of the anti-spam world, per se).

 
 
 Nobody said that nobody moderates the scores. I myself spend a
 considerable amount of time studying them.
 
 However, none of us is so rash as to make adjustments just to make the
 results look better. 99% of the time, investigations into illogical
 scores turn up real-world evidence that explains them.
 Let's take a brief look at your SPF expample.
 
 You'd expect SPF_FAIL to have a higher score than SPF_SOFTFAIL. However,
 the real world shows otherwise.
 
 Let's rip the results out of STATISTICS-set3.txt:
 
 OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME
 
   3.437   4.8942   0.03960.992   0.801.38  SPF_SOFTFAIL
   2.550   3.5717   0.16760.955   0.531.14  SPF_FAIL
 
 Look at the S/O for each. This represents what percentage of mail the
 rule matched is actually spam, where 1.00 means 100% of the matching
 messages were spam.
 
 Notice how the S/O of SPF_FAIL is actually LOWER than SOFTFAIL?
 
 Why? Probably because there are more aggressive admins publishing
 records with -all without thinking about their whole network. The more
 cautious folks who have spent a lot of time thinking about their
 network, are more likely to realize them might have missed something and
 use ~all (softfail).
 
 Human behavior is in no way linear, and SPF here is a result of the
 behavior of the admin publishing the records. My explanation is a guess,
 but it makes sense if you think about the generall behaviors of cautious
 admin compared to a rabbid one.
 
 Now let's look at DATE_IN_FUTURE..
 
   1.605   2.2815   0.02640.989   0.751.96  DATE_IN_FUTURE_03_06
   0.926   1.2926   0.07160.948   0.561.67  DATE_IN_FUTURE_06_12
   1.986   2.8309   0.01510.995   0.812.77  DATE_IN_FUTURE_12_24
   0.260   0.3676   0.00750.980   0.532.69  DATE_IN_FUTURE_24_48
   0.089   0.1252   0.00380.971   0.402.10  DATE_IN_FUTURE_48_96
   0.245   0.3474   0.00750.979   0.522.40  DATE_IN_FUTURE_96_XX
 
 Here again we see non-linearity in the S/O performance of the real world
 data. Note that 06_12 has the lowest S/O of the lot, and, imagine that,
 it got the lowest score too.
 
 There's some degree of non-fit here, as DATE_IN_FUTURE_96_XX has the
 highest score, but not the highest S/O. A study of the actual corpus
 itself would likely show that this rule is more likely to match spam
 that has very few other rules matching, hence the higher score. This is
 a case of that interaction with other rules thing in my last message.
 
 HTML_OBFUSCATE is a bit more complicated:
 
 OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME
   0.637   0.9048   0.01320.986   0.661.45  HTML_OBFUSCATE_05_10
   0.921   1.3128   0.00750.994   0.741.77  HTML_OBFUSCATE_10_20
   0.671   0.9582   0.1.000   0.703.40  HTML_OBFUSCATE_20_30
   0.406   0.5801   0.1.000   0.632.86  HTML_OBFUSCATE_30_40
   0.198   0.2836   0.1.000   0.512.64  HTML_OBFUSCATE_40_50
   0.242   0.3458   0.1.000   0.542.03  HTML_OBFUSCATE_50_60
   0.081   0.1155   0.1.000   0.401.65  HTML_OBFUSCATE_60_70
   0.055   0.0784   0.1.000   0.381.47  HTML_OBFUSCATE_70_80
   0.012   0.0178   0.1.000   0.310.98  HTML_OBFUSCATE_80_90
   0.004   0.0057   0.1.000   0.290.00  HTML_OBFUSCATE_90_100
 
 Here the S/O's have a clear up-swing trend. However, the hit-rates at
 the upper end are very low. That's probably what's suppressing the
 scores of 60_70 and higher. They just don't hit enough mail to be relevant.

Yep.  It may also be that they hit only spam that is *already* scoring
over 10 points  -- at that stage, there's no point in adding to the score,
so whatever value the perceptron assigns to it would have no real effect.
Therefore the perceptron is free to assign low scores.

--j.


Default SpamAssassin scores don't make sense

2006-11-06 Thread Adam Katz
(re-sending this email, last one sent 10/30 15:19 EST and not posted to
 list, despite that another message was sent to the list successfully
only half an hour later.)

Why do default scores not increase with severity?  For example,
SpamAssassin 3.1.7 has inconsistent progression of default scores in
html obfuscation, dates set in the future, and spf marking:

score HTML_OBFUSCATE_05_10 1.421 1.169 1.522 1.449
score HTML_OBFUSCATE_10_20 1.936 1.397 2.371 1.770
score HTML_OBFUSCATE_20_30 2.720 2.720 3.145 3.400
score HTML_OBFUSCATE_30_40 2.480 2.480 2.867 2.859
score HTML_OBFUSCATE_40_50 2.160 2.160 2.498 2.640
score HTML_OBFUSCATE_50_60 2.049 2.061 2.342 2.031
score HTML_OBFUSCATE_60_70 1.637 1.592 1.892 1.652
score HTML_OBFUSCATE_70_80 1.440 1.507 1.680 1.472
score HTML_OBFUSCATE_80_90 1.244 1.191 1.397 0.982
score HTML_OBFUSCATE_90_100 0 # n=0 n=1 n=2 n=3

score DATE_IN_FUTURE_03_06 2.061 2.007 2.275 1.961
score DATE_IN_FUTURE_06_12 1.680 1.498 1.883 1.668
score DATE_IN_FUTURE_12_24 2.320 2.316 2.775 2.767
score DATE_IN_FUTURE_24_48 2.080 2.080 2.498 2.688
score DATE_IN_FUTURE_48_96 1.680 1.680 1.942 2.100
score DATE_IN_FUTURE_96_XX 1.920 1.888 2.276 2.403

score SPF_NEUTRAL  0 1.379 0 1.069
score SPF_SOFTFAIL 0 1.470 0 1.384
score SPF_FAIL 0 1.333 0 1.142

To keep this message on-topic, I am not commenting about whether the
scores are fair to message spaminess.  I am asking about their fairness
to other relative levels; HTML_OBFUSCATE_80_90 should be higher than
HTML_OBFUSCATE_20_30, DATE_IN_FUTURE_96_XX should be higher than
DATE_IN_FUTURE_12_24, and SPF_FAIL should be higher than SPF_SOFTFAIL.
There are a large number of sets of scores that seem quite arbitrary in
their assignment.  While I'm happy to see this no longer includes
Bayesian scores, it is still a huge surprise.

Is there an explanation guide online about how scores are chosen?  Is
this automated in some manner that seems to get incremental tests
weighted based more on frequency than on severity?  I try to keep my
rules tweaks minor, but my local.cf is getting bigger and bigger...  how
large is the typical local.cf for servers with 25-100 users?

Thank you,
Adam Katz



Re: Default SpamAssassin scores don't make sense

2006-11-06 Thread John D. Hardin
On Mon, 6 Nov 2006, Adam Katz wrote:

 Why do default scores not increase with severity?  For example,
 SpamAssassin 3.1.7 has inconsistent progression of default scores in
 html obfuscation, dates set in the future, and spf marking:

The default scores are generated by analyzing their performance
against hand-categorized corpa of actual emails. If a rule hits spam
often and ham rarely, it will be given a higher score than one that
hits spam often and ham occasionally.

Rule performance against real-world traffic can be counterintuitive,
and the rules' relation to each other isn't necessarily a part of the
analysis.

I'm sure somebody else will chime in with a relevant wiki URL...

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The difference between ignorance and stupidity is that the stupid
  desire to remain ignorant. -- Jim Bacon
---
 Tomorrow: the campaign ads stop



Re: Default SpamAssassin scores don't make sense

2006-11-06 Thread Theo Van Dinter
On Mon, Nov 06, 2006 at 04:58:37PM -0500, Adam Katz wrote:
 Why do default scores not increase with severity?  For example,
 SpamAssassin 3.1.7 has inconsistent progression of default scores in
 html obfuscation, dates set in the future, and spf marking:

http://wiki.apache.org/spamassassin/HowScoresAreAssigned

The short version is that as far as SA and the perceptron (that which
generates the scores) are concerned, rules are independent.  There is no
increase in severity, either a rule hits or it doesn't

 weighted based more on frequency than on severity?  I try to keep my
 rules tweaks minor, but my local.cf is getting bigger and bigger...  how
 large is the typical local.cf for servers with 25-100 users?

Most people, I think, leave most of the scores alone, which is good and bad.

FWIW, the suggested way to get the best SA performance for your mail
server is to generate your own score sets from your own mails.  I don't
actually know of anyone who does this though.

-- 
Randomly Selected Tagline:
These periods are always 15 minutes shorter than I'd like them, and 
 probably 15 minutes longer than you'd like them.   - Prof. Van Bluemel


pgpzTIFEX9QNu.pgp
Description: PGP signature


Re: Default SpamAssassin scores don't make sense

2006-11-06 Thread Adam Katz
On Mon, 6 Nov 2006, John D. Hardin wrote:
 The default scores are generated by analyzing their performance
 against hand-categorized corpa of actual emails. If a rule hits spam
 often and ham rarely, it will be given a higher score than one that
 hits spam often and ham occasionally.

That sounds very Bayesian ... with Bayesian rules already doing that sort
of logic, I would hope there is more human thinking put into score
setting.  The bayes rules are very shiny and effective, but they are
supposed to assist the hand-drawn filters rather than have the filters
assist the bayes rules.  ... if that's the current SA thinking, I'll have
to re-consider CRM114 and other better-than-bayes systems.

 Rule performance against real-world traffic can be counterintuitive,
 and the rules' relation to each other isn't necessarily a part of the
 analysis.

That's where the human tweaking is supposed to happen; if gobs of spam
flag the 80% meter of some test while no ham does, and the 90% meter is
almost never hit by anything, it should have a higher value than the 80%
meter does.  If the 90% meter has more ham than spam despite the 80% meter
having more spam than ham, the tests need to be closely looked at rather
than inappropriately weighted.

just my two cents, anyway

-Adam Katz


Re: Default SpamAssassin scores don't make sense

2006-11-06 Thread Adam Katz
Theo Van Dinter wrote:
 http://wiki.apache.org/spamassassin/HowScoresAreAssigned

Thanks, that's what I was looking for.

 The short version is that as far as SA and the perceptron (that which
 generates the scores) are concerned, rules are independent.  There is no
 increase in severity, either a rule hits or it doesn't

Bayes is a perfect example of this, and is mentioned as such on the very
page you referenced.  Several filters, including those that I listed at
the top of this thread, are indeed incremental, increasing in severity.
 I am shocked to hear that there is nobody moderating the automated
scores (an Alan Greenspan of the anti-spam world, per se).

 weighted based more on frequency than on severity?  I try to keep my
 rules tweaks minor, but my local.cf is getting bigger and bigger...  how
 large is the typical local.cf for servers with 25-100 users?
 
 Most people, I think, leave most of the scores alone, which is good and bad.
 
 FWIW, the suggested way to get the best SA performance for your mail
 server is to generate your own score sets from your own mails.  I don't
 actually know of anyone who does this though.

The wiki documentation seems to discourage modifying rule scores more
than encourage it.  We have a dozen or so custom rules and several dozen
score modifications, plus a good number of the CustomRulesets from the
wiki and the SARE collection are in full use.

All low-scoring caught spam at my company gets caught in a net for my IT
staff to review, with the rare false positives getting forwarded to the
intended recipients, sa-learn'ed as ham, and offending scores get
reviewed.  A good 20-50% of the low-scoring caught spam was caught only
due to our custom filters and adjusted scores (note, these numbers are
with SA 2.63; our upgrade to 3.1.7 is scheduled for before thanksgiving
while I work out the kinks).

-Adam


Re: Default SpamAssassin scores don't make sense

2006-11-06 Thread Matt Kettler
Adam Katz wrote:
 Theo Van Dinter wrote:
   
 http://wiki.apache.org/spamassassin/HowScoresAreAssigned
 

 Thanks, that's what I was looking for.

   
 The short version is that as far as SA and the perceptron (that which
 generates the scores) are concerned, rules are independent.  There is no
 increase in severity, either a rule hits or it doesn't
 

 Bayes is a perfect example of this, and is mentioned as such on the very
 page you referenced.  Several filters, including those that I listed at
 the top of this thread, are indeed incremental, increasing in severity.
  I am shocked to hear that there is nobody moderating the automated
 scores (an Alan Greenspan of the anti-spam world, per se).
   


Nobody said that nobody moderates the scores. I myself spend a
considerable amount of time studying them.

However, none of us is so rash as to make adjustments just to make the
results look better. 99% of the time, investigations into illogical
scores turn up real-world evidence that explains them.
Let's take a brief look at your SPF expample.

You'd expect SPF_FAIL to have a higher score than SPF_SOFTFAIL. However,
the real world shows otherwise.

Let's rip the results out of STATISTICS-set3.txt:

OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME

  3.437   4.8942   0.03960.992   0.801.38  SPF_SOFTFAIL
  2.550   3.5717   0.16760.955   0.531.14  SPF_FAIL

Look at the S/O for each. This represents what percentage of mail the
rule matched is actually spam, where 1.00 means 100% of the matching
messages were spam.

Notice how the S/O of SPF_FAIL is actually LOWER than SOFTFAIL?

Why? Probably because there are more aggressive admins publishing
records with -all without thinking about their whole network. The more
cautious folks who have spent a lot of time thinking about their
network, are more likely to realize them might have missed something and
use ~all (softfail).

Human behavior is in no way linear, and SPF here is a result of the
behavior of the admin publishing the records. My explanation is a guess,
but it makes sense if you think about the generall behaviors of cautious
admin compared to a rabbid one.

Now let's look at DATE_IN_FUTURE..

  1.605   2.2815   0.02640.989   0.751.96  DATE_IN_FUTURE_03_06
  0.926   1.2926   0.07160.948   0.561.67  DATE_IN_FUTURE_06_12
  1.986   2.8309   0.01510.995   0.812.77  DATE_IN_FUTURE_12_24
  0.260   0.3676   0.00750.980   0.532.69  DATE_IN_FUTURE_24_48
  0.089   0.1252   0.00380.971   0.402.10  DATE_IN_FUTURE_48_96
  0.245   0.3474   0.00750.979   0.522.40  DATE_IN_FUTURE_96_XX

Here again we see non-linearity in the S/O performance of the real world
data. Note that 06_12 has the lowest S/O of the lot, and, imagine that,
it got the lowest score too.

There's some degree of non-fit here, as DATE_IN_FUTURE_96_XX has the
highest score, but not the highest S/O. A study of the actual corpus
itself would likely show that this rule is more likely to match spam
that has very few other rules matching, hence the higher score. This is
a case of that interaction with other rules thing in my last message.

HTML_OBFUSCATE is a bit more complicated:

OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME
  0.637   0.9048   0.01320.986   0.661.45  HTML_OBFUSCATE_05_10
  0.921   1.3128   0.00750.994   0.741.77  HTML_OBFUSCATE_10_20
  0.671   0.9582   0.1.000   0.703.40  HTML_OBFUSCATE_20_30
  0.406   0.5801   0.1.000   0.632.86  HTML_OBFUSCATE_30_40
  0.198   0.2836   0.1.000   0.512.64  HTML_OBFUSCATE_40_50
  0.242   0.3458   0.1.000   0.542.03  HTML_OBFUSCATE_50_60
  0.081   0.1155   0.1.000   0.401.65  HTML_OBFUSCATE_60_70
  0.055   0.0784   0.1.000   0.381.47  HTML_OBFUSCATE_70_80
  0.012   0.0178   0.1.000   0.310.98  HTML_OBFUSCATE_80_90
  0.004   0.0057   0.1.000   0.290.00  HTML_OBFUSCATE_90_100

Here the S/O's have a clear up-swing trend. However, the hit-rates at
the upper end are very low. That's probably what's suppressing the
scores of 60_70 and higher. They just don't hit enough mail to be relevant.











Re: Default SpamAssassin scores don't make sense

2006-11-06 Thread List Mail User
...
That's where the human tweaking is supposed to happen; if gobs of spam
flag the 80% meter of some test while no ham does, and the 90% meter is
almost never hit by anything, it should have a higher value than the 80%
meter does.  If the 90% meter has more ham than spam despite the 80% meter
having more spam than ham, the tests need to be closely looked at rather
than inappropriately weighted.

just my two cents, anyway

-Adam Katz

Here one of your own examples pops up - SPF_FAIL vs. SPF_SOFT_FAIL.
In the current state of the world, *most* soft fail results are actual
forgeries, but *most* hard fail results are administrator or user error.
So SOFT_FAIL is a better spam sign than FAIL - often these things can
and do make sense when a rational explanation is looked for (but it can be
very far from obvious at time).  Hopefully as administrators learn, things
like SPF, DK and/or DKIM will become more useful (~all and sign some are
both serious dilutions of what the technologies has to offer).

Paul Shupak
[EMAIL PROTECTED]