subject:"Re\: Those \"Re\: good obfupills\" spams"

Re: Those Re: good obfupills spams

2006-05-03 Thread mouss


jdow wrote:


And the point I made is to keep the region right around 5.0 as swept
clean of ambiguous cases as it's possible to maintain. It MAY be that
the reliability of a rule should govern its score upon use. And scores
should have a sprinkling of negative scores as well as mostly positive
scores. It seems like Kalman filter approaches might do some real good.


What about replacing Bayes with Support Vector Machines? anyone played 
with this?



In fact a REAL Kalman filter that trains on feedback the way Bayes
trains on feedback might produce some really interesting results as
well as weed out rules that seem to amount to little or nothing at
the present time. There was somebody here who did discuss a dynamic
scoring engine approach. I wonder how far he got with it. His initial
report sounded quite promising. And it's an ideal setting for Kalman
sort of techniques. This rule is good for condition A, C, and D but
not B...

I do really like the idea of creating a dead zone that has neither
ham or spam in it right around a score of 5 with separate peaks for
ham and spam on either side of that empty zone. It may be hard to
force that kind of selection without some fancy processing, though.

Why not use two different filters:
- SA (without Bayes nor AWL)
- an adaptive filter (bogofilter has unsure zones)
and take the decision based on either or both, depending on the 
configuration of each. with a conservative setup of both, you can decide 
it's spam if either filter says it is (you'll get more FNs, but few 
FPs). with an aggressive setup, you can use AND. with other setups, you 
can do more complex decisions.
An advantage of this is that you can split this as a site-wide filter 
(SA) and a per-user filter.

Re: Those Re: good obfupills spams

2006-05-02 Thread Michael Monnerie

On Sonntag, 30. April 2006 18:40 Matt Kettler wrote:
 However, mails matching BAYES_95 are more likely to be trickier,
 and are likely to match fewer other rules. These messages are more
 likely to require an extra boost from BAYES_95's score than those
 which match BAYES_99.

Like Jane wrote, I don't believe writing rules to just reach over 5.0 
for SPAM is what should be the goal. For the german ruleset I maintain, 
I always try to push SPAM far beyond any mark, without risking FPs. If 
there's some sexual excplicit sentence that's really only possible to 
be SPAM, I'll give it up to 4 points. Most porn SPAM gets around 20-30 
points now. That's good, so I'm on the safe side of text variations 
hitting less rules.

I hope to have some good stats tool soon to be able to see graphically 
if BAYES_99 is secure. What I see from looking at e-mails whenever I 
check, it's very sure SPAM being worth 4-5 points. That might be 
because my main language is german, and most SPAM is english, though.

Jane made a good statement about writing rules to make a peak around 
5.0, to clearly indicate SPAM or HAM. Sounds reasonable, but I didn't 
test it, because I don't happen to have any FPs.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpwQEmOnfljO.pgp
Description: PGP signature

Re: Those Re: good obfupills spams (bayes scores)

2006-05-02 Thread Michael Monnerie

On Montag, 1. Mai 2006 17:51 Matt Kettler wrote:
 Looking at my own current real-world maillogs, BAYES_99 matched 6,643
 messages last week. Of those, only 24 had total scores under 9.0.
 (with BAYES_99 scoring 3.5, it would take a message with a total
 score of less than 8.5 to drop below the threshold of 5.0 if BAYES_99
 were omitted entirely).

I've looked at a snap of 424 spams these last days, with a total of 8519 
points, making about 20 points per SPAM (average).
67 SPAMs are 5-9.99 points, 62 are 10-14.99 points, 294 are 15.

So it's those 67 SPAMs that should worry me most - some of them are 
really just 5 (2 times 5.06 points), and I would like them to score 
higher, because that's more on the safe side. Unfortunately, I don't 
have the possibility to check which rules were hit, amavisd-new doesn't 
log that.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgp9gBSLwtFX1.pgp
Description: PGP signature

Re: Those Re: good obfupills spams (bayes scores)

2006-05-02 Thread Bart Schaefer


Incidentally, the FAQ answer for HowScoresAreAssigned on the SA wiki
is out of date.

Re: Those Re: good obfupills spams

2006-05-02 Thread jdow


From: Michael Monnerie [EMAIL PROTECTED]

Jane made a good statement about writing rules to make a peak around 
5.0, to clearly indicate SPAM or HAM. Sounds reasonable, but I didn't 
test it, because I don't happen to have any FPs.


Actually it's Joanne not Jane. {^_-}

And the point I made is to keep the region right around 5.0 as swept
clean of ambiguous cases as it's possible to maintain. It MAY be that
the reliability of a rule should govern its score upon use. And scores
should have a sprinkling of negative scores as well as mostly positive
scores. It seems like Kalman filter approaches might do some real good.
In fact a REAL Kalman filter that trains on feedback the way Bayes
trains on feedback might produce some really interesting results as
well as weed out rules that seem to amount to little or nothing at
the present time. There was somebody here who did discuss a dynamic
scoring engine approach. I wonder how far he got with it. His initial
report sounded quite promising. And it's an ideal setting for Kalman
sort of techniques. This rule is good for condition A, C, and D but
not B...

I do really like the idea of creating a dead zone that has neither
ham or spam in it right around a score of 5 with separate peaks for
ham and spam on either side of that empty zone. It may be hard to
force that kind of selection without some fancy processing, though.

{^_^}

Re: Those Re: good obfupills spams (bayes scores)

2006-05-02 Thread jdow


From: Michael Monnerie [EMAIL PROTECTED]


67 SPAMs are 5-9.99 points,


OK, for a record with regards to spam and ham I have had four come
through between 5 and 7.99 points out of about 1600 messages in my
personal mail buckets. Two were from always-on which I signed up
for when Powell the Younger was the FCC commissioner pushing BPL.
As a ham radio operator I had a rather strong interest in opposition
to this critter. I more or less abandoned the account and let the
Tony Perkins email fall into the spam box. I finally got motivated
to remove that today. One other was from a mailing list some dweeb
spammed the list saying he could not read some other dweeb's base64
email. It was marginal. But it being marked as spam gave me a chance
to send a private email jab back to the first dweeb about his message
being spam. That leaves one real spam and no hams in the 5.0 to 7.99
wasteland. I have five messages between 8.0 and 10 inclusive. One is
from my local congressman. I figure if I include his junk phone calls
in my phone spam complaints (to him) the email should also be spam. I
doubt I'll white list him. He and I don't agree much. I am much too
libertarian for his Republican stance. If he'd start lecturing about
people being responsible for themselves and their own actions I might
be moved to white list him. But that's neither here nor there. The
wasteland concept is working.

And during this period no real ham has gotten a BAYES_99 rule hit.
But the sample's still a little small to say anything solid about
the 0.5% theoretical false alarm ratio, yet - maybe - if I stretch
it a little.

{^_^}   - Joanne does ramble sometimes, doesn't she?

Re: Those Re: good obfupills spams (bayes scores)

2006-05-02 Thread jdow


From: jdow [EMAIL PROTECTED]


One is
from my local congressman. I figure if I include his junk phone calls
in my phone spam complaints (to him) the email should also be spam. I
doubt I'll white list him. He and I don't agree much. I am much too
libertarian for his Republican stance. If he'd start lecturing about
people being responsible for themselves and their own actions I might
be moved to white list him. But that's neither here nor there. The
wasteland concept is working.


This earns a follow-up. I checked the Bayes score on his message. I
must conclude that Bayes is pretty accurate. Since I consider virtually
anybody in office today to be a spamming gasbag having his message hit
a perfect 1. Bayes score is just too perfect.

My faith in Bayes is increased appropriately.

{^_-}

RE: Those Re: good obfupills spams

2006-05-01 Thread Bowie Bailey

Matt Kettler wrote:
 
 It is perfectly reasonable to assume that most of the mail matching
 BAYES_99 also matches a large number of the stock spam rules that SA
 comes with. These highly-obvious mails are the model after which
 most SA rules are made in the first place. Thus, these mails need
 less score boost, as they already have a lot of score from other
 rules in the ruleset. 
 
 However, mails matching BAYES_95 are more likely to be trickier,
 and are likely to match fewer other rules. These messages are more
 likely to require an extra boost from BAYES_95's score than those
 which match BAYES_99.

I can't argue with this description, but I don't agree with the
conclusion on the scores.

The Bayes rules are not individual unrelated rules.  Bayes is a series
of rules indicating a range of probability that a message is spam or
ham.  You can argue over the exact scoring, but I can't see any reason
to score BAYES_99 lower than BAYES_95.  Since a BAYES_99 message is
even more likely to be spam than a BAYES_95 message, it should have at
least a slightly higher score.  It is obvious that a BAYES_99 message
is more likely to hit other rules and therefore be less reliant on a
score increase from Bayes, but this is no reason to drop the score.

I generally don't look into the rule scoring too much unless I run
into a problem, but I thought this had been fixed in the latest
couple of versions anyway.  Looking at my score file, I find this:

score BAYES_00 0.0001 0.0001 -2.312 -2.599
score BAYES_05 0.0001 0.0001 -1.110 -1.110
score BAYES_20 0.0001 0.0001 -0.740 -0.740
score BAYES_40 0.0001 0.0001 -0.185 -0.185
score BAYES_50 0.0001 0.0001 0.001 0.001
score BAYES_60 0.0001 0.0001 1.0 1.0
score BAYES_80 0.0001 0.0001 2.0 2.0
score BAYES_95 0.0001 0.0001 3.0 3.0
score BAYES_99 0.0001 0.0001 3.5 3.5

The scores march upwards just as expected.  And it looks like the
50-99 scores have been set by hand rather than the perceptron.

-- 
Bowie

RE: Those Re: good obfupills spams (bayes scores)

2006-05-01 Thread Bowie Bailey

jdow wrote:
 From: Bart Schaefer [EMAIL PROTECTED]
  
  On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote:
   In SA 3.1.0 they did force-fix the scores of the bayes rules,
   particularly the high-end. The perceptron assigned BAYES_99 a
   score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up
   to 3.50.
   
   That does make me wonder if:
   1) When BAYES_9x FPs, it FPs in conjunction with lots of
   other rules due to the ham corpus being polluted with spam.
  
  My recollection is that there was speculation that the BAYES_9x
  rules were scored too low not because they FP'd in conjunction
  with other rules, but because against the corpus they TRUE P'd in
  conjunction with lots of other rules, and that it therefore wasn't
  necessary for the perceptron to assign a high score to BAYES_9x in
  order to push the total over the 5.0 threshold.
  
  The trouble with that is that users expect training on their
  personal spam flow to have a more significant effect on the
  scoring.  I want to train bayes to compensate for the LACK of
  other rules matching, not just to give a final nudge when a bunch
  of others already hit.
  
  I filed a bugzilla some while ago suggesting that the bayes
  percentage ought to be used to select a rule set, not to adjust
  the score as a component of a rule set.
 
 There is one other gotcha. I bet vastly different scores are
 warranted for Bayes when run with per user training and rules as
 compared to global training and rules.

Ack!  I missed the subject change on this thread prior to my last
reply.  Sorry about the duplication.

I think it is also a matter of manual training vs autolearning.  A
Bayes database that is consistently trained manually will be more
accurate and can support higher scores.

-- 
Bowie

Re: Those Re: good obfupills spams (bayes scores)

2006-05-01 Thread Matt Kettler

Bowie Bailey wrote:
 Matt Kettler wrote:
 It is perfectly reasonable to assume that most of the mail matching
 BAYES_99 also matches a large number of the stock spam rules that SA
 comes with. These highly-obvious mails are the model after which
 most SA rules are made in the first place. Thus, these mails need
 less score boost, as they already have a lot of score from other
 rules in the ruleset. 

 However, mails matching BAYES_95 are more likely to be trickier,
 and are likely to match fewer other rules. These messages are more
 likely to require an extra boost from BAYES_95's score than those
 which match BAYES_99.
 
 I can't argue with this description, but I don't agree with the
 conclusion on the scores.
 
 The Bayes rules are not individual unrelated rules.  Bayes is a series
 of rules indicating a range of probability that a message is spam or
 ham.  You can argue over the exact scoring, but I can't see any reason
 to score BAYES_99 lower than BAYES_95.  Since a BAYES_99 message is
 even more likely to be spam than a BAYES_95 message, it should have at
 least a slightly higher score. 

No, it should not. I've given a conclusive reason why it may not always be
higher. My reason has a solid statistical reason behind it. This reasoning is
supported by real-world testing and real-world data.

You've given your opinion to the contrary, but no facts to support it other than
 declaring the rules to be related, and therefore the score should correlate
with  the bayes-calculated probability of spam.

While I don't disagree with you that BAYES_99 scoring lower than BAYES_95 is
counter-intuitive. I do not believe intuition alone is a reason to defy reality.

If there are other rules with better performance (ie: fewer FPs) that
consistently coincide with the hits of BAYES_99, those rules should soak up the
lions share of the score. However, if there are a lot of spam messages with no
other rules hit, BAYES_99 should get a strong boost from those.

The perceptron results show that the former is largely true. BAYES_99 is mostly
redundant. To back it up, I'm going to verify it with my own maillog data.

Looking at my own current real-world maillogs, BAYES_99 matched 6,643 messages
last week. Of those, only 24 had total scores under 9.0. (with BAYES_99 scoring
3.5, it would take a message with a total score of less than 8.5 to drop below
the threshold of 5.0 if BAYES_99 were omitted entirely).

So less than 0.37% of BAYES_99's hits actually mattered on my system last week.

BAYES_95 on the other hand hit 468 messages, 20 of which scored less than 9.0.
That's 4.2% of messages with BAYES_95 hits. A considerably larger percentage.
Bringing it down to 8.0 to compensate for the score difference and you still get
17 messages, which is still a much larger 3.6% of it's hits.

On my system, BAYES_95 is significant in pushing mail over the spam threshold 10
times more often than BAYES_99 is.

What are your results?

These are the greps I used, based on MailScanner log formats. Should work for
spamd users, perhaps with slight modifications.

zgrep BAYES_99 maillog.1.gz |wc -l
zgrep BAYES_99 maillog.1.gz |grep -v score=[1-9][0-9]\. | grep -v score=9\.
|wc -l

RE: Those Re: good obfupills spams (bayes scores)

2006-05-01 Thread Bowie Bailey

Matt Kettler wrote:
 Bowie Bailey wrote:
  
  The Bayes rules are not individual unrelated rules.  Bayes is a
  series of rules indicating a range of probability that a message is
  spam or ham.  You can argue over the exact scoring, but I can't see
  any reason to score BAYES_99 lower than BAYES_95.  Since a BAYES_99
  message is even more likely to be spam than a BAYES_95 message, it
  should have at least a slightly higher score.
 
 No, it should not. I've given a conclusive reason why it may not
 always be higher. My reason has a solid statistical reason behind it.
 This reasoning is supported by real-world testing and real-world data.
 
 You've given your opinion to the contrary, but no facts to support it
  other than declaring the rules to be related, and therefore the
 score should correlate with  the bayes-calculated probability of spam.
 
 While I don't disagree with you that BAYES_99 scoring lower than
 BAYES_95 is counter-intuitive. I do not believe intuition alone is a
 reason to defy reality. 
 
 If there are other rules with better performance (ie: fewer FPs) that
 consistently coincide with the hits of BAYES_99, those rules should
 soak up the lions share of the score. However, if there are a lot of
 spam messages with no other rules hit, BAYES_99 should get a strong
 boost from those. 
 
 The perceptron results show that the former is largely true. BAYES_99
 is mostly redundant. To back it up, I'm going to verify it with my
 own maillog data. 
 
 Looking at my own current real-world maillogs, BAYES_99 matched 6,643
 messages last week. Of those, only 24 had total scores under 9.0.
 (with BAYES_99 scoring 
 3.5, it would take a message with a total score of less than 8.5 to
 drop below the threshold of 5.0 if BAYES_99 were omitted entirely).
 
 So less than 0.37% of BAYES_99's hits actually mattered on my system
 last week. 
 
 BAYES_95 on the other hand hit 468 messages, 20 of which scored less
 than 9.0. That's 4.2% of messages with BAYES_95 hits. A considerably
 larger percentage. Bringing it down to 8.0 to compensate for the
 score difference and you still get 17 messages, which is still a much
 larger 3.6% of it's hits. 
 
 On my system, BAYES_95 is significant in pushing mail over the spam
 threshold 10 times more often than BAYES_99 is.
 
 What are your results?
 
 These are the greps I used, based on MailScanner log formats. Should
 work for spamd users, perhaps with slight modifications.
 
 zgrep BAYES_99 maillog.1.gz |wc -l
 zgrep BAYES_99 maillog.1.gz |grep -v score=[1-9][0-9]\. | grep -v
 score=9\. | wc -l

I think we are arguing from slightly different viewpoints.

You are saying that higher scores are not needed since the lower score
is made up for by other rules.  I have 13,935 hits for BAYES_99.  412
of them are lower than 9.0.  This seems to be caused by either AWL hits
lowering the score or very few other rules hitting.  BAYES_95 hit 469
times with 18 hits lower than 9.0.  This means that, for me, BAYES_95
is significant slightly more often, percentage-wise, than BAYES_99.
But considering volume, I would say that BAYES_99 is the more useful
rule.

However, that's not what I was arguing about to begin with.  Because
of the way the Bayes algorhytm works, I should be able to have more
confidence in a BAYES_99 hit than a BAYES_95 hit.  Therefore, it
should have a higher score.  Otherwise, you get the very strange
occurance that if you train Bayes too well and the spams go from
BAYES_95 to BAYES_99, the SA score actually goes down.

The better you train your Bayes database, the more confidence it
should have in picking out the spams.  As the scoring moves from
BAYES_50 up to BAYES_99, the SA score should increase to reflect the
higher confidence level of the Bayes engine.

-- 
Bowie

Re: Those Re: good obfupills spams (bayes scores)

2006-05-01 Thread jdow

From: Bowie Bailey [EMAIL PROTECTED]

jdow wrote:

From: Bart Schaefer [EMAIL PROTECTED]

 On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote:

  In SA 3.1.0 they did force-fix the scores of the bayes rules,
  particularly the high-end. The perceptron assigned BAYES_99 a
  score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up
  to 3.50.

  That does make me wonder if:

  1) When BAYES_9x FPs, it FPs in conjunction with lots of
  other rules due to the ham corpus being polluted with spam.

 My recollection is that there was speculation that the BAYES_9x

 rules were scored too low not because they FP'd in conjunction
 with other rules, but because against the corpus they TRUE P'd in
 conjunction with lots of other rules, and that it therefore wasn't
 necessary for the perceptron to assign a high score to BAYES_9x in
 order to push the total over the 5.0 threshold.

 The trouble with that is that users expect training on their

 personal spam flow to have a more significant effect on the
 scoring.  I want to train bayes to compensate for the LACK of
 other rules matching, not just to give a final nudge when a bunch
 of others already hit.

 I filed a bugzilla some while ago suggesting that the bayes

 percentage ought to be used to select a rule set, not to adjust
 the score as a component of a rule set.

There is one other gotcha. I bet vastly different scores are
warranted for Bayes when run with per user training and rules as
compared to global training and rules.

Ack!  I missed the subject change on this thread prior to my last
reply.  Sorry about the duplication.

I think it is also a matter of manual training vs autolearning.  A
Bayes database that is consistently trained manually will be more
accurate and can support higher scores.

That may be a factor, too, Bowie. But, as igor is experiencing, the
site Bayes faces a singular problem in that one person's ham is another
person's extreme spam. When no two people can agree on what spam is
and what ham is a global Bayes becomes (relatively) ineffective very
quickly. This is why I included that afterthought which probably should
have been highlighted up front.

{^_^}

Re: Those Re: good obfupills spams (bayes scores)

2006-05-01 Thread jdow


From: Matt Kettler [EMAIL PROTECTED]


Bowie Bailey wrote:

Matt Kettler wrote:

It is perfectly reasonable to assume that most of the mail matching
BAYES_99 also matches a large number of the stock spam rules that SA
comes with. These highly-obvious mails are the model after which
most SA rules are made in the first place. Thus, these mails need
less score boost, as they already have a lot of score from other
rules in the ruleset. 


However, mails matching BAYES_95 are more likely to be trickier,
and are likely to match fewer other rules. These messages are more
likely to require an extra boost from BAYES_95's score than those
which match BAYES_99.


I can't argue with this description, but I don't agree with the
conclusion on the scores.

The Bayes rules are not individual unrelated rules.  Bayes is a series
of rules indicating a range of probability that a message is spam or
ham.  You can argue over the exact scoring, but I can't see any reason
to score BAYES_99 lower than BAYES_95.  Since a BAYES_99 message is
even more likely to be spam than a BAYES_95 message, it should have at
least a slightly higher score. 


No, it should not. I've given a conclusive reason why it may not always be
higher. My reason has a solid statistical reason behind it. This reasoning is
supported by real-world testing and real-world data.

You've given your opinion to the contrary, but no facts to support it other than
declaring the rules to be related, and therefore the score should correlate
with  the bayes-calculated probability of spam.

While I don't disagree with you that BAYES_99 scoring lower than BAYES_95 is
counter-intuitive. I do not believe intuition alone is a reason to defy reality.


Matt, as much as I respect you, which is a heck of a lot, I must insist
that your assertion is correct within a model that does not fit the real
needs of the situation, PARTICULARLY for individual Bayes databases that
are not fed carelessly. You don't want to crowd just above 5. You want
to have a score gap around five with almost all spam scoring well above
10. Now, I have managed to almost sweep that region clean, about 1 or 2%
of my spam falls between 5 and 8. Another 4% falls under 10. This makes
sweeping the spam directory for ham quite easily. (It also serves as a
wry note that some of the magazines to which I subscribe also spam me.
It's high nift that their spams are tagged and their hams are not, mostly.
When they are tagged they're still not BAYES_9x, though.)


If there are other rules with better performance (ie: fewer FPs) that
consistently coincide with the hits of BAYES_99, those rules should soak up the
lions share of the score. However, if there are a lot of spam messages with no
other rules hit, BAYES_99 should get a strong boost from those.


If there are any significant number of spams that hit ONLY BAYES_99 then
BAYES_99 should either very nearly kick them over or actually kick them
over. That said I have found that clever meta rules regarding specific
sources and the BAYES scores have allowed me to widen my wasteland of
scores between 4 and 10 lately. This may be an important trick to employ.


The perceptron results show that the former is largely true. BAYES_99 is mostly
redundant. To back it up, I'm going to verify it with my own maillog data.

Looking at my own current real-world maillogs, BAYES_99 matched 6,643 messages
last week. Of those, only 24 had total scores under 9.0. (with BAYES_99 scoring
3.5, it would take a message with a total score of less than 8.5 to drop below
the threshold of 5.0 if BAYES_99 were omitted entirely).

So less than 0.37% of BAYES_99's hits actually mattered on my system last week.


I wish I had that luck. And I have over 40 rule sets in action plus a
large bunch of my own.


BAYES_95 on the other hand hit 468 messages, 20 of which scored less than 9.0.
That's 4.2% of messages with BAYES_95 hits. A considerably larger percentage.
Bringing it down to 8.0 to compensate for the score difference and you still get
17 messages, which is still a much larger 3.6% of it's hits.

On my system, BAYES_95 is significant in pushing mail over the spam threshold 10
times more often than BAYES_99 is.

What are your results?


I don't have a script that tells me what BAYES_99 hits on singularly. I
posted what ratio of ham and spam BAYES_99 and BAYES_00 hit on the last
10 weeks. What I do NOT see is any benefit from trying to crowd close to
5 points. This is the reason I see the model itself as being broken. When
I ran with the original BAYES scores on 3.04 the system leaked like a
seive. As I upped the score the missed spams decreased. But every once
and awhile I seem to hit a lead position on a round of innovatvie spams
which hit nothing but BAYES_99. Loren responds by writing rules to catch
them. I respond by increasing Bayes. I figure 5.0 is my limit, though.
Although I figure a good ratio for mismarked ham to mismarked spam is
about 0.1:1. When I get that bad I make a new meta rule or

Re: Those Re: good obfupills spams

2006-04-30 Thread Matt Kettler

jdow wrote:
 And it is scored LESS than BAYES_95 by default. That's a clear signal
 that the theory behind the scoring system is a little skewed and needs
 some rethinking.

No.. It does not mean there's a problem with the scoring system. It
means you're trying to apply a simple linear model to something which is
inherently not linear, nor simple.  This is a VERY common misconception. 

Please bear with me for a minute as I explain some things.

This is more-or-less the same misconception as expecting rules with
higher S/O's to always score higher than those with lower S/O's.
Generally this is true, but there's more to consider that can cause the
opposite to be true.

The score of a rule in SA is not a function of the performance of that
one rule, nor should it be. The score of a SA rule is a function of what
combinations of rules it matches in conjunction with. This creates a
real world fit of a complex set of rules against real-world behavior.

This complex interaction between rules results in most of the problems
people see. People inherently expect simple linearity. However, consider
that SA scoring is a function of  several hundred variable equation
attempting to perform an approximation of optimal  fit to a sampling of
human behavior. Why, based on that, would you ever expect the score two
of those hundreds of variables to be linear as a function of spam hit rate?

It is perfectly reasonable to assume that most of the mail matching
BAYES_99 also matches a large number of the stock spam rules that SA
comes with. These highly-obvious mails are the model after which most SA
rules are made in the first place. Thus, these mails need less score
boost, as they already have a lot of score from other rules in the ruleset.

However, mails matching BAYES_95 are more likely to be trickier, and
are likely to match fewer other rules. These messages are more likely to
require an extra boost from BAYES_95's score than those which match
BAYES_99.

Re: Those Re: good obfupills spams

2006-04-30 Thread jdow


From: Matt Kettler [EMAIL PROTECTED]


jdow wrote:

And it is scored LESS than BAYES_95 by default. That's a clear signal
that the theory behind the scoring system is a little skewed and needs
some rethinking.


No.. It does not mean there's a problem with the scoring system. It
means you're trying to apply a simple linear model to something which is
inherently not linear, nor simple.  This is a VERY common misconception. 


Please bear with me for a minute as I explain some things.

This is more-or-less the same misconception as expecting rules with
higher S/O's to always score higher than those with lower S/O's.
Generally this is true, but there's more to consider that can cause the
opposite to be true.

The score of a rule in SA is not a function of the performance of that
one rule, nor should it be. The score of a SA rule is a function of what
combinations of rules it matches in conjunction with. This creates a
real world fit of a complex set of rules against real-world behavior.

This complex interaction between rules results in most of the problems
people see. People inherently expect simple linearity. However, consider
that SA scoring is a function of  several hundred variable equation
attempting to perform an approximation of optimal  fit to a sampling of
human behavior. Why, based on that, would you ever expect the score two
of those hundreds of variables to be linear as a function of spam hit rate?

It is perfectly reasonable to assume that most of the mail matching
BAYES_99 also matches a large number of the stock spam rules that SA
comes with. These highly-obvious mails are the model after which most SA
rules are made in the first place. Thus, these mails need less score
boost, as they already have a lot of score from other rules in the ruleset.

However, mails matching BAYES_95 are more likely to be trickier, and
are likely to match fewer other rules. These messages are more likely to
require an extra boost from BAYES_95's score than those which match
BAYES_99.


Matt, I understand the model. I believe it is the wrong model to apply.
Experience indicates this is very much the case. And I must remind you
that an ounce of actual experience is worth a neutron star worth of
theory. When I raise the score of BAYES_99 and 95 to be monotonically
increasing with 99 at or very near to 5.0 I demonstrably get far fewer
escaped spams at a cost of VERY few (low enough to be unnoticed)
caught hams. When experience disagrees with the model some extra thought
is required with regards to the model.

As far as I can see the perceptron does not handle single factors that
are exceptionally good at catching spam with exceptionally few false
alarms AND is often the ONLY marker for actual spam that is caught. This
latter is very often the case here with regards to BAYES_99. (The logged
hams caught as spam are escaped spams or else cases that are impossible
to catch correctly without complex meta rules, such as LKML or other
technical code, patch, and diff bearing mailing lists that also do not
adequately filter being relayed through. For these lists I have actually
had to artificially rescore all the BAYES scores using meta rules. I am
fine tuning these alterations at the moment. I've had some spams escape.
My OWN number of mismarked hams has become vanishingly small. Loren does
not have these rules yet. If he wants 'em I'll give them to him quickly.)
Note the goodness of BAYES_99 here - stats including me and Loren over
80,000 messages total.

  1BAYES_99 20156 4.88   25.08   91.610.07
  1BAYES_00 4610715.54   57.360.07   78.98

The BAYES_99's *I* have seen on ham are running exclusively to spams
that managed to fire a negative scoring rule for mailing lists. LKML and
FreeBSD are the two lists so affected.

Now, in the last two days I have had some ham come in as spam, not due
to BAYES_9x at all. It was a political discussion that happened to
trigger a lot of the mortgage spam rules. Cain't do much about that!
(At least not without giving Yahoo Groups an utterly unwarranted negative
score.)

Based on *MY* experience the perceptron performance model was not the
appropriate model to choose.

{^_^}

Re: Those Re: good obfupills spams

2006-04-30 Thread jdow


From: Matt Kettler [EMAIL PROTECTED]


jdow wrote:

And it is scored LESS than BAYES_95 by default. That's a clear signal
that the theory behind the scoring system is a little skewed and needs
some rethinking.


No.. It does not mean there's a problem with the scoring system. It
means you're trying to apply a simple linear model to something which is
inherently not linear, nor simple.  This is a VERY common misconception. 


I have a few more thoughts that are probably more constructive than
merely saying that the perceptron model is obviously wrong where the
rubber meets the road.

It seems to me that the observed operation of the perceptron is driving
scores towards the minimum amount over 5.0 that can be managed and still
capture most of the spam.

I've been operating here on a slightly different principle, at least
for my own rules. I work to drive scores away from 5.0, in both
directions as needed. If I see a low scoring captured spam being
always scored greater than 8 or 10 I am pleased. When I see items
in the 5 to 10 range I figure out what I can do to drive it to the
correct direction, ham or spam. (Bayes is usually my choice of
action. I usually discover another email that has a mid level Bayes
score rather than an extreme level. And I wish I could codify how I
choose to feed Bayes. I feed it almost on an intuitive level, This
is Bayes food or Bayes already has a lot of this food and is
obviously a little confused for my mail mix. That's hardly a good
rule for feeding that I can pass on to people. sigh)

So rather than having perceptron try to push towards a relatively
smooth curve of all scores it should work to push the overall score
profile into what one wag in an SF story called a brassiere curve,
which is wonderfully descriptive when you think of some of the 50's
and 60's fashions. {^_-} If it can create a viable valley with very
few messages scoring near 5.0 and as wide a variance between the ham
peak and the spam peak it may act better.

THAT said, I note that I use meta rules regularly to generate some
modest negative scores as well as positive scores. This has had some
good side effects on the reliability of scoring here. I've noticed that
a small few of the SARE rules, over time, decayed into being fairly
good indications of ham rather than spam. Since SARE is more agile
than the basic SA rule sets it might be good if the SARE people took
this as a tool for choke lift and separation on the ham and spam
peaks. It might be interesting to notice if the obverse of in this
BL is a decent indication of not spam and give that a modest bit
of negative score for some cases.

I just pulled RATWR10a_MESSID because it was hitting 13% of ham and
4% of spam, for example. Perhaps I should have given it a very small
negative score instead. I note right now that SPF_PASS seems to hit
50% (!) of ham and only 4% of spam. Perhaps it, too, should have a
slight negative score to help increase the span between the ham peak
and the spam peak.

It does seem clear to me that the objective is not to create minimum
score to mark as spam so much as to create as large a separation between
typical ham and spam scores as possible. The more reliable rules should
have higher negative and positive scores as appropriate.

And of course, the final caveat, is that I am running a two person
install of SpamAssassin with per user rules and scores with two fairly
intelligent (although some people question that about me) people running
their own user rules and Bayes. I also do not use automatic anything.
I cannot get over the idea that automatic whitelist and automatic learning
are not necessarily stable concepts UNTIL you have a very reliable BAYES
setup and set of rules from manual training. I have that and still cannot
convince myself to fix what isn't broken.

{^_^}   Joanne

Re: Those Re: good obfupills spams

2006-04-29 Thread Matt Kettler

jdow wrote:



 BAYES_99, by definition, has a 1% false positive rate.

 That is what Bayes thinks. I think it is closer to something between
 0.5% and 0.1% false positive. I have mine trained down lethally fine
 at this point, it appears.

Ok.. Fine, let's take 0.1% FP rate, 10x better than theoretical, but
still realistic at some sites.. Even still.. Is that low enough to be
worth assigning 5.0 points to?

No.

Re: Those Re: good obfupills spams

2006-04-29 Thread jdow


From: Matt Kettler [EMAIL PROTECTED]


jdow wrote:





BAYES_99, by definition, has a 1% false positive rate.


That is what Bayes thinks. I think it is closer to something between
0.5% and 0.1% false positive. I have mine trained down lethally fine
at this point, it appears.


Ok.. Fine, let's take 0.1% FP rate, 10x better than theoretical, but
still realistic at some sites.. Even still.. Is that low enough to be
worth assigning 5.0 points to?

No.


So far, however, it has been worth 5.0 points. I've had it (actually)
false positive maybe once in the last month. I've had SA mismark some
BAYES_99 spam, however. The spam had other characteristics that earned
a slight negative score.

(I've since developed some meta rules that are reducing this. It the
email is from a mailing list I know I give a modest negative score.
Then if the Bayes is high or very high I award some positive points.
High plus mailing list is about 2 points with mailing list being -1.5.
Very high adds another 2 points. That second two points MAY have to be
fine tuned upwards.)

{^_^}

Re: Those Re: good obfupills spams

2006-04-29 Thread Loren Wilton

 This is my first post after having lurked some. So, I'm getting these
 same RE: good spams but they're hitting eight rules and typically
 scoring between 30 and 40. I'm really unsophisticated compared to you
 guys, and it begs the question––what am I doing wrong? All I use is a
 tweaked user_prefs wherein I have gradually raised the scores on
 standard rules found in spam that slips through over a period of
 time. These particular spams are over the top on bayesian (1.0), have
 multiple database hits, forged rcvd_helo and so forth. Bayesian alone
 flags them for me. I'm trying to understand the reason you would not
 want to have these type of rules set high enough? I must be way over
 optimized––what am I not getting?

The danger with tweaking standard rule scores you probably already know: you
are at least theoretically likely to get more false positives, because the
score set was optimized for the original scores.

Of course, everyone tweaks a few scores at least.  After all, that is why
they are tweakable.  As long as you watch you spam bucket for FPs you can go
pretty high on things.  Looking at today's spam I only see one of these, but
it scored around 30.  I have a bunch of the Re: news kind that all scored
35-39.

Loren

Re: Those Re: good obfupills spams

2006-04-29 Thread jdow


From: Loren Wilton [EMAIL PROTECTED]


This is my first post after having lurked some. So, I'm getting these
same RE: good spams but they're hitting eight rules and typically
scoring between 30 and 40. I'm really unsophisticated compared to you
guys, and it begs the question––what am I doing wrong? All I use is a
tweaked user_prefs wherein I have gradually raised the scores on
standard rules found in spam that slips through over a period of
time. These particular spams are over the top on bayesian (1.0), have
multiple database hits, forged rcvd_helo and so forth. Bayesian alone
flags them for me. I'm trying to understand the reason you would not
want to have these type of rules set high enough? I must be way over
optimized––what am I not getting?


The danger with tweaking standard rule scores you probably already know: you
are at least theoretically likely to get more false positives, because the
score set was optimized for the original scores.

Of course, everyone tweaks a few scores at least.  After all, that is why
they are tweakable.  As long as you watch you spam bucket for FPs you can go
pretty high on things.  Looking at today's spam I only see one of these, but
it scored around 30.  I have a bunch of the Re: news kind that all scored
35-39.

   Loren


And most of those which are not black lists are from 88_FVGT_body.cf.

{^_^}Joanne

Re: Those Re: good obfupills spams

2006-04-29 Thread List Mail User

...

Matt Kettler replied:

John Tice wrote:

 Greetings,
 This is my first post after having lurked some. So, I'm getting these
 same RE: good spams but they're hitting eight rules and typically
 scoring between 30 and 40. I'm really unsophisticated compared to you
 guys, and it begs the questionwhat am I doing wrong? All I use is a
 tweaked user_prefs wherein I have gradually raised the scores on
 standard rules found in spam that slips through over a period of time.
 These particular spams are over the top on bayesian (1.0), have
 multiple database hits, forged rcvd_helo and so forth. Bayesian alone
 flags them for me. I'm trying to understand the reason you would not
 want to have these type of rules set high enough? I must be way over
 optimizedwhat am I not getting? 


BAYES_99, by definition, has a 1% false positive rate.


Matt,

If we were to presume a uniform distribution between a estimate of
99% and 100%, then the FP rate would be .5%, not 1%.  And for large sites
(i.e. 10s or thousands or messages a day or more), this may be what occurs;
But what I see and what I assume many other small sites see is a very much
non-uniform distribution;  From the last 30 hours, the average estimate (re.
the value reported in the bayes=xxx clause) for spam hitting the BAYES_99
rule is .41898013269 with about two thirds of them reporting bayes=1 and
a lowest value of bayes=0.998721756590216.

While SA is quite robust largely because of the design feature that
no single reason/cause/rule should by itself mark a message as spam, I have
to guess that the FP rate that the majority of users see for BAYES_99 is far
below 1%.  From the estimators reported above, I would expect that I would
have seen a .003% FP rate for the last day plus a little, if only I received
100,000 or so spam messages to have been able to see it:).

I don't change the scoring from the defaults, but if people were to
want to, maybe they could change the rules (or add a rule) for BAYES_99_99
which would take only scores higher than bayes=. and which (again with
a uniform distribution) have an expected FP rate of .005% - than re-score
that just closer (but still less) than the spam threshold, or add a point
of fraction thereof to raise the score to just under the spam threshhold
(adding a new rule would avoid having to edit distributed files and thus
would probably be the better method).

Anyway, to better address the OP's questions:  The system is more
robust if instead of changing the weighting of existing rules (assuming that
they were correctly established to begin with), you add more possible inputs
(and preferably independant ones - i.e. where the FPs between rules have a
low correlation).  Simply increasing scores will improve your spam capture
rate, just as decreasing the spam threshold will - but both methods will add
to the likelyhood of false positives;  Look into the distributed documentation
to see the expected FP rates at different spam threshold levels for numbers
to drive this point home (and changing specific rules' scores is just like
changing the threshold, but in a non-uniform fashion - unless you actually
measure the values for your own site's mail and recompute numbers that are
a better estimate for local traffic).

Paul Shupak
[EMAIL PROTECTED]

Re: Those Re: good obfupills spams

2006-04-29 Thread Matt Kettler

List Mail User wrote:
 ...
 

 Matt Kettler replied:

   
 John Tice wrote:
 
 Greetings,
 This is my first post after having lurked some. So, I'm getting these
 same RE: good spams but they're hitting eight rules and typically
 scoring between 30 and 40. I'm really unsophisticated compared to you
 guys, and it begs the question––what am I doing wrong? All I use is a
 tweaked user_prefs wherein I have gradually raised the scores on
 standard rules found in spam that slips through over a period of time.
 These particular spams are over the top on bayesian (1.0), have
 multiple database hits, forged rcvd_helo and so forth. Bayesian alone
 flags them for me. I'm trying to understand the reason you would not
 want to have these type of rules set high enough? I must be way over
 optimized––what am I not getting? 
   
 BAYES_99, by definition, has a 1% false positive rate.

 

   Matt,

   If we were to presume a uniform distribution between a estimate of
 99% and 100%, then the FP rate would be .5%, not 1%. 
You're right Paul, my bad..

But again, I don't care if it's 0.01%. The question here is is jacking
up the score of BAYES_99 to be greater than required_hits a good idea.
The answer is No, because BAYES_99 is NOT a 100% accurate test. By
definition it does have a non-zero FP rate.

  And for large sites
 (i.e. 10s or thousands or messages a day or more), this may be what occurs;
 But what I see and what I assume many other small sites see is a very much
 non-uniform distribution;  From the last 30 hours, the average estimate (re.
 the value reported in the bayes=xxx clause) for spam hitting the BAYES_99
 rule is .41898013269 with about two thirds of them reporting bayes=1 and
 a lowest value of bayes=0.998721756590216.
   
Yes, that's to be expected with Chi-Squared combining.
   While SA is quite robust largely because of the design feature that
 no single reason/cause/rule should by itself mark a message as spam, I have
 to guess that the FP rate that the majority of users see for BAYES_99 is far
 below 1%.  From the estimators reported above, I would expect that I would
 have seen a .003% FP rate for the last day plus a little, if only I received
 100,000 or so spam messages to have been able to see it:).
   
True, but it's still not nearly zero. Even in the corpus testing, which
is run by the best of the best in SA administration and maintenance,
BAYES_99 matched 0.0396% of ham, or 21 out of 53,091 hams. (Based on
set-3 of SA 3.1.0)

Given we are dealing with user who doesn't even understand why you might
not want this set high enough, I would expect the level of
sophistication in bayes maintenance

Besides.. If you want to make a mathematics based argument against me,
start by explaining how the perceptron mathematically is flawed. It
assigned the original score based on real-world data. Not our vast over
simplifications. You should have good reason to question its design
before second guessing it's scoring based on speculation such as this.

   I don't change the scoring from the defaults, but if people were to
 want to, maybe they could change the rules (or add a rule) for BAYES_99_99
 which would take only scores higher than bayes=. and which (again with
 a uniform distribution) have an expected FP rate of .005% - than re-score
 that just closer (but still less) than the spam threshold, 

I'd agree.. However, the OP has already made BAYES_99  required_hits.
Bad idea. Period.

Re: Those Re: good obfupills spams

2006-04-29 Thread John Tice



Thank you all for the comments. My personal experience is that  
Bayes_99 is amazingly reliable––close to 100% for me. I formerly had  
it set to 4.5 so that bayes_99 plus one other hit would flag it, but  
then I started getting some spam that were not hit by any other rule,  
yet bayes correctly identified them. It seems more effective to write  
some negative scoring ham rules specific to my important content  
rather than to take less than full advantage of the high accuracy of  
bayes. And, the spams in question in this thread are hitting multiple  
rules, so should be catchable without having bayes_99 set over the top.


I suppose all these judgments must take into account one's  
preferences, degree of aversion to FPs, and the diversity of content  
you're working with. Hopefully I will improve accuracy by writing/ 
adding custom rules and be able to back off the scoring of standard  
rules, but I have been fairly successful (by my own definition) at  
tweaking standard rules with minimal FPs. At times when I do get a FP  
I take a look at it and think this one just deserves to get  
filtered. I'm willing to accept a certain amount, or a certain type,  
in order to be aggressive against spam. Before I only had access to  
user_prefs, but now that I have a server with root access it's a  
brand new ball game. The mechanics are easy enough, but I need to  
work on the broader strategies. Any particularly good reading to be  
recommended?


John







On Apr 29, 2006, at 8:12 AM, List Mail User wrote:


...


Matt Kettler replied:


John Tice wrote:


Greetings,
This is my first post after having lurked some. So, I'm getting  
these

same RE: good spams but they're hitting eight rules and typically
scoring between 30 and 40. I'm really unsophisticated compared to  
you
guys, and it begs the question––what am I doing wrong? All I use  
is a

tweaked user_prefs wherein I have gradually raised the scores on
standard rules found in spam that slips through over a period of  
time.

These particular spams are over the top on bayesian (1.0), have
multiple database hits, forged rcvd_helo and so forth. Bayesian  
alone

flags them for me. I'm trying to understand the reason you would not
want to have these type of rules set high enough? I must be way over
optimized––what am I not getting?



BAYES_99, by definition, has a 1% false positive rate.



Matt,

If we were to presume a uniform distribution between a estimate of
99% and 100%, then the FP rate would be .5%, not 1%.  And for large  
sites
(i.e. 10s or thousands or messages a day or more), this may be what  
occurs;
But what I see and what I assume many other small sites see is a  
very much
non-uniform distribution;  From the last 30 hours, the average  
estimate (re.
the value reported in the bayes=xxx clause) for spam hitting the  
BAYES_99
rule is .41898013269 with about two thirds of them reporting  
bayes=1 and

a lowest value of bayes=0.998721756590216.

While SA is quite robust largely because of the design feature that
no single reason/cause/rule should by itself mark a message as  
spam, I have
to guess that the FP rate that the majority of users see for  
BAYES_99 is far
below 1%.  From the estimators reported above, I would expect that  
I would
have seen a .003% FP rate for the last day plus a little, if only I  
received

100,000 or so spam messages to have been able to see it:).

I don't change the scoring from the defaults, but if people were to
want to, maybe they could change the rules (or add a rule) for  
BAYES_99_99
which would take only scores higher than bayes=. and which  
(again with
a uniform distribution) have an expected FP rate of .005% - than re- 
score
that just closer (but still less) than the spam threshold, or add a  
point
of fraction thereof to raise the score to just under the spam  
threshhold
(adding a new rule would avoid having to edit distributed files and  
thus

would probably be the better method).

Anyway, to better address the OP's questions:  The system is more
robust if instead of changing the weighting of existing rules  
(assuming that
they were correctly established to begin with), you add more  
possible inputs
(and preferably independant ones - i.e. where the FPs between rules  
have a
low correlation).  Simply increasing scores will improve your spam  
capture
rate, just as decreasing the spam threshold will - but both methods  
will add
to the likelyhood of false positives;  Look into the distributed  
documentation
to see the expected FP rates at different spam threshold levels for  
numbers
to drive this point home (and changing specific rules' scores is  
just like
changing the threshold, but in a non-uniform fashion - unless you  
actually
measure the values for your own site's mail and recompute numbers  
that are

a better estimate for local traffic).

Paul Shupak
[EMAIL PROTECTED]

Re: Those Re: good obfupills spams

2006-04-29 Thread Bart Schaefer


On 4/29/06, List Mail User [EMAIL PROTECTED] wrote:


While SA is quite robust largely because of the design feature that
no single reason/cause/rule should by itself mark a message as spam, I have
to guess that the FP rate that the majority of users see for BAYES_99 is far
below 1%.



Anyway, to better address the OP's questions:  The system is more
robust if instead of changing the weighting of existing rules (assuming that
they were correctly established to begin with), you add more possible inputs


Exactly.  For example, I find that anything in the subset consisting
of messages that don't mention my email address anywhere in the To/Cc
headers and also scoring above BAYES_70 has close to 100% likelyhood
of being spam.  However, since I also get quite a lot of mail that
doesn't fall into that subset, I can't simply increase the scores for
the BAYES rules.

In this case I use procmail to examine the headers after SA has scored
the message, but I've been considering creating a meta-rule of some
kind.  Trouble is, SA doesn't know what my email address means (it'd
need to be a list of addresses), and I'm reluctant to turn on
allow_user_rules.

Re: Those Re: good obfupills spams

2006-04-29 Thread Bart Schaefer


On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote:

Besides.. If you want to make a mathematics based argument against me,
start by explaining how the perceptron mathematically is flawed. It
assigned the original score based on real-world data.


Did it?  I thought the BAYES_* scores have been fixed values for a
while now, to force the perceptron to adapt the other scores to fit.

Re: Those Re: good obfupills spams (bayes scores)

2006-04-29 Thread Matt Kettler

Bart Schaefer wrote:
 On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote:
 Besides.. If you want to make a mathematics based argument against me,
 start by explaining how the perceptron mathematically is flawed. It
 assigned the original score based on real-world data.

 Did it?  I thought the BAYES_* scores have been fixed values for a
 while now, to force the perceptron to adapt the other scores to fit.

Actually, you're right..I'm shocked and floored, but you're right.

 In SA 3.1.0 they did force-fix the scores of the bayes rules,
particularly the high-end. The perceptron assigned BAYES_99 a score of
1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.

That does make me wonder if:
1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
due to the ham corpus being polluted with spam. This forces the
perceptron to attempt to compensate.  (Pollution always is a problem
since nobody is perfect, but it occurs to differing degrees).
   -or-
2) The perceptron is out-of whack. (I highly doubt this because the
perceptron generated the ones for 3.0.x and they were fine)
  -or-
3) The Real-world FPs of BAYES_99 really do tend to also be cascades
with other rules in the 3.1.x ruleset, and the perceptron is correctly
capping the score. This could differ from 3.0.x due to change in rules,
or change in ham patterns over time.
  -or-
4) one of the corpus submitters has a poorly trained bayes db.
(possible, but I doubt it)

Looking at statistics-set3 for 3.0.x and 3.1.x there was a slight
increase in ham-hits for BAYES_99 and a slight decrease in spam hits.
3.0.x:
OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME
43.515 89.3888 0.0335 1.000 0.83 1.89 BAYES_99
3.1.x:
OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME
60.712 86.7351 0.0396 1.000 0.90 3.50 BAYES_99

Also to consider is set3 of 3.0.x was much closer to a 50/50 mix of
spam/nonspam (48.7/51.3) than 3.1.0 was (nearly 70/30)

Re: Those Re: good obfupills spams (bayes scores)

2006-04-29 Thread Bart Schaefer


On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote:

 In SA 3.1.0 they did force-fix the scores of the bayes rules,
particularly the high-end. The perceptron assigned BAYES_99 a score of
1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.

That does make me wonder if:
1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
due to the ham corpus being polluted with spam.


My recollection is that there was speculation that the BAYES_9x rules
were scored too low not because they FP'd in conjunction with other
rules, but because against the corpus they TRUE P'd in conjunction
with lots of other rules, and that it therefore wasn't necessary for
the perceptron to assign a high score to BAYES_9x in order to push the
total over the 5.0 threshold.

The trouble with that is that users expect training on their personal
spam flow to have a more significant effect on the scoring.  I want to
train bayes to compensate for the LACK of other rules matching, not
just to give a final nudge when a bunch of others already hit.

I filed a bugzilla some while ago suggesting that the bayes percentage
ought to be used to select a rule set, not to adjust the score as a
component of a rule set.

Re: Those Re: good obfupills spams

2006-04-29 Thread jdow


From: Matt Kettler [EMAIL PROTECTED]

List Mail User wrote:

Matt Kettler replied:

John Tice wrote:


Greetings,
This is my first post after having lurked some. So, I'm getting these
same RE: good spams but they're hitting eight rules and typically
scoring between 30 and 40. I'm really unsophisticated compared to you
guys, and it begs the question––what am I doing wrong? All I use is a
tweaked user_prefs wherein I have gradually raised the scores on
standard rules found in spam that slips through over a period of time.
These particular spams are over the top on bayesian (1.0), have
multiple database hits, forged rcvd_helo and so forth. Bayesian alone
flags them for me. I'm trying to understand the reason you would not
want to have these type of rules set high enough? I must be way over
optimized––what am I not getting?


BAYES_99, by definition, has a 1% false positive rate.



If we were to presume a uniform distribution between a estimate of
99% and 100%, then the FP rate would be .5%, not 1%.

You're right Paul, my bad..

But again, I don't care if it's 0.01%. The question here is is jacking
up the score of BAYES_99 to be greater than required_hits a good idea.
The answer is No, because BAYES_99 is NOT a 100% accurate test. By
definition it does have a non-zero FP rate.


I run AT 5.0. When I see my first false alarm solely from BAYES_99
I will reduce it slightly. I know what theory says. I also know that
BAYES_99 alone captures more spam than it has ever captured ham for
false imprisonment.


 And for large sites
(i.e. 10s or thousands or messages a day or more), this may be what occurs;
But what I see and what I assume many other small sites see is a very much
non-uniform distribution;  From the last 30 hours, the average estimate (re.
the value reported in the bayes=xxx clause) for spam hitting the BAYES_99
rule is .41898013269 with about two thirds of them reporting bayes=1 and
a lowest value of bayes=0.998721756590216.


Yes, that's to be expected with Chi-Squared combining.

While SA is quite robust largely because of the design feature that
no single reason/cause/rule should by itself mark a message as spam, I have
to guess that the FP rate that the majority of users see for BAYES_99 is far
below 1%.  From the estimators reported above, I would expect that I would
have seen a .003% FP rate for the last day plus a little, if only I received
100,000 or so spam messages to have been able to see it:).


True, but it's still not nearly zero. Even in the corpus testing, which
is run by the best of the best in SA administration and maintenance,
BAYES_99 matched 0.0396% of ham, or 21 out of 53,091 hams. (Based on
set-3 of SA 3.1.0)


And it is scored LESS than BAYES_95 by default. That's a clear signal
that the theory behind the scoring system is a little skewed and needs
some rethinking.


Given we are dealing with user who doesn't even understand why you might
not want this set high enough, I would expect the level of
sophistication in bayes maintenance

Besides.. If you want to make a mathematics based argument against me,
start by explaining how the perceptron mathematically is flawed. It
assigned the original score based on real-world data. Not our vast over
simplifications. You should have good reason to question its design
before second guessing it's scoring based on speculation such as this.


When it can give BAYES_99 a score LOWER than BAYES_95 it clearly has
a conceptual problem. (It also indicates that automatic Bayes filter
training has its own conceptual flaws.)


I don't change the scoring from the defaults, but if people were to
want to, maybe they could change the rules (or add a rule) for BAYES_99_99
which would take only scores higher than bayes=. and which (again with
a uniform distribution) have an expected FP rate of .005% - than re-score
that just closer (but still less) than the spam threshold,


I'd agree.. However, the OP has already made BAYES_99  required_hits.
Bad idea. Period.


5.0 is, admittedly marginal. 6 or 7 is not a good idea. Not enough rules
exist that will pull it back down. (Thinking on that I suspect there are
some SARE rules that should lower the score slightly when they are not
hit.)

{^_^}

Re: Those Re: good obfupills spams (bayes scores)

2006-04-29 Thread jdow


From: Matt Kettler [EMAIL PROTECTED]


Bart Schaefer wrote:

On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote:

Besides.. If you want to make a mathematics based argument against me,
start by explaining how the perceptron mathematically is flawed. It
assigned the original score based on real-world data.


Did it?  I thought the BAYES_* scores have been fixed values for a
while now, to force the perceptron to adapt the other scores to fit.


Actually, you're right..I'm shocked and floored, but you're right.

In SA 3.1.0 they did force-fix the scores of the bayes rules,
particularly the high-end. The perceptron assigned BAYES_99 a score of
1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.

That does make me wonder if:
   1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
due to the ham corpus being polluted with spam. This forces the
perceptron to attempt to compensate.  (Pollution always is a problem
since nobody is perfect, but it occurs to differing degrees).
  -or-
   2) The perceptron is out-of whack. (I highly doubt this because the
perceptron generated the ones for 3.0.x and they were fine)
 -or-
   3) The Real-world FPs of BAYES_99 really do tend to also be cascades
with other rules in the 3.1.x ruleset, and the perceptron is correctly
capping the score. This could differ from 3.0.x due to change in rules,
or change in ham patterns over time.
 -or-
   4) one of the corpus submitters has a poorly trained bayes db.
(possible, but I doubt it)

Looking at statistics-set3 for 3.0.x and 3.1.x there was a slight
increase in ham-hits for BAYES_99 and a slight decrease in spam hits.
3.0.x:
OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME
43.515 89.3888 0.0335 1.000 0.83 1.89 BAYES_99
3.1.x:
OVERALL%   SPAM% HAM% S/ORANK   SCORE  NAME
60.712 86.7351 0.0396 1.000 0.90 3.50 BAYES_99

Also to consider is set3 of 3.0.x was much closer to a 50/50 mix of
spam/nonspam (48.7/51.3) than 3.1.0 was (nearly 70/30)


What happens comes from the basic reality that Bayes and the other
rules are not orthogonal sets. So many other rules hit 95 and 99 that
the perceptron artificially reduced the goodness rating for these rules.

It needs some serious skewing to catch situations where 95 or 99 hit and
very few other rules hit. Those are the times the accuracy of Bayes is
needed the most. I've found, here, that 5.0 is a suitable score. I
suspect if I were more realistic 4.9 would be closer. But I still do
remember learning the score bias and being floored by it when I noticed
99 on some spams that leaked through with ONLY the 99 hit. I am speaking
of dozens of spams hit that way.

So far over several years I've found a few special cases that warrant
negative rules. That seems to be pulling the 99 rule's false alarm
rate down to I can't see it. (I have, however, been tempted to generate
a BAYES_99p5 rule and a BAYES_99p9 rule to fine tune the scores up around
4.9 and 5.0.)

{^_

Re: Those Re: good obfupills spams (bayes scores)

2006-04-29 Thread jdow


From: Bart Schaefer [EMAIL PROTECTED]

On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote:

 In SA 3.1.0 they did force-fix the scores of the bayes rules,
particularly the high-end. The perceptron assigned BAYES_99 a score of
1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.

That does make me wonder if:
1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
due to the ham corpus being polluted with spam.


My recollection is that there was speculation that the BAYES_9x rules
were scored too low not because they FP'd in conjunction with other
rules, but because against the corpus they TRUE P'd in conjunction
with lots of other rules, and that it therefore wasn't necessary for
the perceptron to assign a high score to BAYES_9x in order to push the
total over the 5.0 threshold.

The trouble with that is that users expect training on their personal
spam flow to have a more significant effect on the scoring.  I want to
train bayes to compensate for the LACK of other rules matching, not
just to give a final nudge when a bunch of others already hit.

I filed a bugzilla some while ago suggesting that the bayes percentage
ought to be used to select a rule set, not to adjust the score as a
component of a rule set.

 jdow  There is one other gotcha. I bet vastly different scores
are warranted for Bayes when run with per user training and rules
as compared to global training and rules.

{^_^}

Re: Those Re: good obfupills spams

2006-04-28 Thread qqqq

|
| They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone
| aren't scored high enough to classify as spam, and I'm reluctant to
| crank them up just for this.  However, the number of spams getting
| through SA has tripled in the last four days or so, from around 14 for
| every thousand trapped, to around 40.
|
| I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far
| they aren't having any useful effect.  Other suggestions?

I would make a subject Re: good  rule that scores just high enough to push 
it to the spam level.

Re: Those Re: good obfupills spams

2006-04-28 Thread Stuart Johnston

Bart Schaefer wrote:

The largest number of spam messages currently getting through SA at my
site are short text-only spams with subject Re: good followed by an
obfuscated drug name (so badly mangled as to be unrecognizable in many
cases). The body contains a gappy-text list of several other kinds of
equally unreadable pharmaceuticals, a single URL which changes daily
if not more often, and then several random words and a short excerpt
from a novel.

They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone
aren't scored high enough to classify as spam, and I'm reluctant to
crank them up just for this. However, the number of spams getting
through SA has tripled in the last four days or so, from around 14 for
every thousand trapped, to around 40.

I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far
they aren't having any useful effect. Other suggestions?

The ReplaceTags plugin can be very useful for creating rules to match
these. Let's say you get a message with text that looks like:

S b P u A z M

where the lower-case letters vary. A traditional rule might look like:

/S [a-z] P [a-z] A [a-z] M/

Which is really not too bad. However, ReplaceTags allows you to create
short hand. Something like:

replace_tag WS ( [a-z] )

And your rule becomes:

/SWSPWSAWSM/

For this to work, you'll also need to add your rule name to a
replace_rules line. Using parentheses in your regex will create wasted
captures so you'll probably want to use a different method to mark off
the whitespace. You also might want to add a negative lookahead
although in this case you probably wouldn't need it.

For more on ReplaceTags:
http://spamassassin.apache.org/full/3.1.x/dist/doc/Mail_SpamAssassin_Plugin_ReplaceTags.html

-Stuart

Re: Those Re: good obfupills spams

2006-04-28 Thread Bart Schaefer


On 4/28/06,  [EMAIL PROTECTED] wrote:


I would make a subject Re: good  rule that scores just high enough to push 
it to the spam level.


They're only scoring about 3.3, and I'm reluctant to make Re: good
worth 2 points all by itself.  That'd be worse than increasing the
spamcop score.

A meta rule, though ...

Re: Those Re: good obfupills spams

2006-04-28 Thread List Mail User

...

Bart Schaefer wrote:
The largest number of spam messages currently getting through SA at my
site are short text-only spams with subject Re: good  followed by an
obfuscated drug name (so badly mangled as to be unrecognizable in many
cases).  The body contains a gappy-text list of several other kinds of
equally unreadable pharmaceuticals, a single URL which changes daily
if not more often, and then several random words and a short excerpt
from a novel.

They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone
aren't scored high enough to classify as spam, and I'm reluctant to
crank them up just for this.  However, the number of spams getting
through SA has tripled in the last four days or so, from around 14 for
every thousand trapped, to around 40.

I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far
they aren't having any useful effect.  Other suggestions?


These few rules can help a lot (potentially with some possible FPs
though).  And as always, train your BAYES with the ones that get through
and enable the digest tests (i.e. DCC, Pyzor and Razor).

uridnsblURI_COMPLETEWHOIS   
combined-HIB.dnsiplists.completewhois.com.  A
bodyURI_COMPLETEWHOIS   eval:check_uridnsbl('URI_COMPLETEWHOIS')
describeURI_COMPLETEWHOIS   URI in 
combined-HIB.dnsiplists.completewhois.com
tflags  URI_COMPLETEWHOIS   net 
score   URI_COMPLETEWHOIS   1.25

uridnsblURI_IN_SORBS_DNS_SPAM   spam.dnsbl.sorbs.net.   A
bodyURI_IN_SORBS_DNS_SPAM   
eval:check_uridnsbl('URI_IN_SORBS_DNS_SPAM')
describeURI_IN_SORBS_DNS_SPAM   URI in spam.dnsbl.sorbs.net
tflags  URI_IN_SORBS_DNS_SPAM   net
score   URI_IN_SORBS_DNS_SPAM   1.125

meta URI_M_SBL_COMWHOIS (URI_COMPLETEWHOIS  URIBL_SBL)
describe URI_M_SBL_COMWHOIS Both SBL and COMPLETEWHOIS
score URI_M_SBL_COMWHOIS1.375

meta URI_M_SORBS_SPAM_SBL   (URI_IN_SORBS_DNS_SPAM  URIBL_SBL)
describe URI_M_SORBS_SPAM_SBL   Both SORBS SPAM and SBL
score URI_M_SORBS_SPAM_SBL  0.5

meta URI_M_SORBS_SPAM_CWHO  (URI_IN_SORBS_DNS_SPAM  URI_COMPLETEWHOIS)
describe URI_M_SORBS_SPAM_CWHO  Both SORBS SPAM and CompleteWhois
score URI_M_SORBS_SPAM_CWHO 0.833

These rules help to catch brand new domains at the same IP as
previous spam domains (i.e. they are IP based BLs).  If you have any
religous problems with SORBS, leave those out.  About 92% of what I
see hit the completewhois rule, also hits the meta-rule, and over 9 months,
I've never had an FP from the meta rule (which means my scoring is likely
out of whack - too high for the BL tests, and too low for the meta rules).

Also, as always, watch out for line-wrap and be sure to lint after
adding them to any local configuration files.

These add two DNS lookups, but will catch about half of Leo's pill
spam (adding several points for most of them).

Paul Shupak
[EMAIL PROTECTED]

Re: Those Re: good obfupills spams (uridnsbl's, A records vs NS records)

2006-04-28 Thread Matt Kettler

List Mail User wrote:

 
   These few rules can help a lot (potentially with some possible FPs
 though).  And as always, train your BAYES with the ones that get through
 and enable the digest tests (i.e. DCC, Pyzor and Razor).
 
 uridnsblURI_COMPLETEWHOIS   
 combined-HIB.dnsiplists.completewhois.com.  A
 bodyURI_COMPLETEWHOIS   
 eval:check_uridnsbl('URI_COMPLETEWHOIS')
 describeURI_COMPLETEWHOIS   URI in 
 combined-HIB.dnsiplists.completewhois.com
 tflags  URI_COMPLETEWHOIS   net 
 score   URI_COMPLETEWHOIS   1.25

snip
 
   These rules help to catch brand new domains at the same IP as
 previous spam domains (i.e. they are IP based BLs). 

Neat stuff Paul.. I'll have to try it out.


That said, technically, doesn't this really look up the IP address by fetching
the NS record, not the A record of the URI? (this would catch domains hosted at
the same nameserver, not domains hosted at the same server IP address)

Or has SA changed and it looks up both NS and A for uridnsbl?

I know previously there was a strong argument against looking up the A record,
as it provided an opportunity for spammers to poison email with extra URIs that
nobody would normally click on or lookup. These poison URIs could be used to
trigger DNS attacks, or simply generate slow responses to force a timeout.

NS records on the other hand are generally not handled by the spammer's own DNS
servers, but are returned by the TLD's servers.

ie: the NS record for evi-inc.com is stored on my authoritative DNS server, but
it's only there for completeness. Nobody normally queries it from there except
my own server. Most folks find out the NS list from the servers for .com (ie:
a.gtld-servers.net). This makes it impractical to perform poison URIs if SA is
only looking up NS records.

Re: Those Re: good obfupills spams

2006-04-28 Thread John Tice



Greetings,
This is my first post after having lurked some. So, I'm getting these  
same RE: good spams but they're hitting eight rules and typically  
scoring between 30 and 40. I'm really unsophisticated compared to you  
guys, and it begs the question––what am I doing wrong? All I use is a  
tweaked user_prefs wherein I have gradually raised the scores on  
standard rules found in spam that slips through over a period of  
time. These particular spams are over the top on bayesian (1.0), have  
multiple database hits, forged rcvd_helo and so forth. Bayesian alone  
flags them for me. I'm trying to understand the reason you would not  
want to have these type of rules set high enough? I must be way over  
optimized––what am I not getting?


TIA,
John



On Apr 28, 2006, at 5:36 PM, List Mail User wrote:


Bart Schaefer wrote:
The largest number of spam messages currently getting through SA  
at my
site are short text-only spams with subject Re: good  followed  
by an
obfuscated drug name (so badly mangled as to be unrecognizable in  
many
cases).  The body contains a gappy-text list of several other  
kinds of

equally unreadable pharmaceuticals, a single URL which changes daily
if not more often, and then several random words and a short excerpt
from a novel.

They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone
aren't scored high enough to classify as spam, and I'm reluctant to
crank them up just for this.  However, the number of spams getting
through SA has tripled in the last four days or so, from around 14  
for

every thousand trapped, to around 40.

I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far
they aren't having any useful effect.  Other suggestions?

Re: Those Re: good obfupills spams

2006-04-28 Thread Matt Kettler

John Tice wrote:

 Greetings,
 This is my first post after having lurked some. So, I'm getting these
 same RE: good spams but they're hitting eight rules and typically
 scoring between 30 and 40. I'm really unsophisticated compared to you
 guys, and it begs the question––what am I doing wrong? All I use is a
 tweaked user_prefs wherein I have gradually raised the scores on
 standard rules found in spam that slips through over a period of time.
 These particular spams are over the top on bayesian (1.0), have
 multiple database hits, forged rcvd_helo and so forth. Bayesian alone
 flags them for me. I'm trying to understand the reason you would not
 want to have these type of rules set high enough? I must be way over
 optimized––what am I not getting? 


BAYES_99, by definition, has a 1% false positive rate.

Re: Those Re: good obfupills spams (uridnsbl's, A records vs NS records)

2006-04-28 Thread List Mail User

Neat stuff Paul.. I'll have to try it out.


That said, technically, doesn't this really look up the IP address by fetching
the NS record, not the A record of the URI? (this would catch domains hosted at
the same nameserver, not domains hosted at the same server IP address)

Or has SA changed and it looks up both NS and A for uridnsbl?

I know previously there was a strong argument against looking up the A record,
as it provided an opportunity for spammers to poison email with extra URIs that
nobody would normally click on or lookup. These poison URIs could be used to
trigger DNS attacks, or simply generate slow responses to force a timeout.

NS records on the other hand are generally not handled by the spammer's own DNS
servers, but are returned by the TLD's servers.

ie: the NS record for evi-inc.com is stored on my authoritative DNS server, but
it's only there for completeness. Nobody normally queries it from there except
my own server. Most folks find out the NS list from the servers for .com (ie:
a.gtld-servers.net). This makes it impractical to perform poison URIs if SA is
only looking up NS records.


Matt,

While I'd like to see two classes of rules, and both types of BLs
used for both types of lookup (preferably with different scores - since
my testing shows very different FP and FN rates for 'A' and 'NS' checks),
you are completely correct:  IP based BLs are only used for the 'NS' checks
and RHS based BLs are only used for targeted domain checks (and not for the
domain of the URI's NSs).  Currently nothing is used to directly check the
IP of the spam site (i.e. the 'A' RR), but since in many cases this happens
to be the same as the NS' IP, the IP based BLs often are checking it (though
almost by accident).

I personally think that poisoning spam with extra URIs is already
seen quite a bit, and the issue of DNS timeouts is almost a non-issue, since
you would be no worse off than before.  Already we see stock pumpdump and
419 spams with large amounts of poison URIs in them.  Ultimately the spammer
wants as short a message as he can get by with to maximize the use of his
own bandwidth (or the stolen bandwidth he has access to).

What makes these test much more efficient than you might expect is
that many very-large scale spammers (think ROKSO top-ten) tend to use the
same hosts/IPs for both the web hosting and the DNS server.  Also they tend
to reuse IPs so that last week's spam web server is this week's spam DNS
server.  This means that hosts that hit SORBS spam-traps are often name
servers for current spam runs using brand new domain names that haven't
made SURBL or URIBL lists yet (or sometimes, if you have the misfortune of
being at the start of a run, haven't even hit the digests yet).

I find (after already significant MTA filtering) that these few
rules hit about 10% to 25% of the spam I get.  The SORBS spam list alone
hits almost 25% of spam, but also hits about .85% of ham (but much of that
is email that many people would consider spam),  The completewhois list hits
about 12% of spam, but again, ~.7% of ham.  The meta rules hit slightly more
than the product of the hit ratios of the individual rules (i.e. including
the SBL) for spam (except the completewhois/SBL meta which hits 92% of the
original completewhois hits - i.e. mostly Chinese and Korean IPs, but some
from all parts of the world), and have a no ham hits over the past two or
three months (and only one or two ever);  This implies that they are indeed
independent, with different FP sources and heavily biased toward spam to
begin with.  They do disproportionally catch certain spammers, so they can
be though of as similar to the SARE Specific rule set.  In particular they
work extremely well against certain classes of pill and mortgage spam.

Paul Shupak
[EMAIL PROTECTED]

Re: Those Re: good obfupills spams

2006-04-28 Thread jdow


From: Matt Kettler [EMAIL PROTECTED]


John Tice wrote:


Greetings,
This is my first post after having lurked some. So, I'm getting these
same RE: good spams but they're hitting eight rules and typically
scoring between 30 and 40. I'm really unsophisticated compared to you
guys, and it begs the question––what am I doing wrong? All I use is a
tweaked user_prefs wherein I have gradually raised the scores on
standard rules found in spam that slips through over a period of time.
These particular spams are over the top on bayesian (1.0), have
multiple database hits, forged rcvd_helo and so forth. Bayesian alone
flags them for me. I'm trying to understand the reason you would not
want to have these type of rules set high enough? I must be way over
optimized––what am I not getting?



BAYES_99, by definition, has a 1% false positive rate.


That is what Bayes thinks. I think it is closer to something between
0.5% and 0.1% false positive. I have mine trained down lethally fine
at this point, it appears.
{^_-}

39 matches

Mail list logo