Re: Those Re: good obfupills spams
jdow wrote: And the point I made is to keep the region right around 5.0 as swept clean of ambiguous cases as it's possible to maintain. It MAY be that the reliability of a rule should govern its score upon use. And scores should have a sprinkling of negative scores as well as mostly positive scores. It seems like Kalman filter approaches might do some real good. What about replacing Bayes with Support Vector Machines? anyone played with this? In fact a REAL Kalman filter that trains on feedback the way Bayes trains on feedback might produce some really interesting results as well as weed out rules that seem to amount to little or nothing at the present time. There was somebody here who did discuss a dynamic scoring engine approach. I wonder how far he got with it. His initial report sounded quite promising. And it's an ideal setting for Kalman sort of techniques. This rule is good for condition A, C, and D but not B... I do really like the idea of creating a dead zone that has neither ham or spam in it right around a score of 5 with separate peaks for ham and spam on either side of that empty zone. It may be hard to force that kind of selection without some fancy processing, though. Why not use two different filters: - SA (without Bayes nor AWL) - an adaptive filter (bogofilter has unsure zones) and take the decision based on either or both, depending on the configuration of each. with a conservative setup of both, you can decide it's spam if either filter says it is (you'll get more FNs, but few FPs). with an aggressive setup, you can use AND. with other setups, you can do more complex decisions. An advantage of this is that you can split this as a site-wide filter (SA) and a per-user filter.
Re: Those Re: good obfupills spams
On Sonntag, 30. April 2006 18:40 Matt Kettler wrote: However, mails matching BAYES_95 are more likely to be trickier, and are likely to match fewer other rules. These messages are more likely to require an extra boost from BAYES_95's score than those which match BAYES_99. Like Jane wrote, I don't believe writing rules to just reach over 5.0 for SPAM is what should be the goal. For the german ruleset I maintain, I always try to push SPAM far beyond any mark, without risking FPs. If there's some sexual excplicit sentence that's really only possible to be SPAM, I'll give it up to 4 points. Most porn SPAM gets around 20-30 points now. That's good, so I'm on the safe side of text variations hitting less rules. I hope to have some good stats tool soon to be able to see graphically if BAYES_99 is secure. What I see from looking at e-mails whenever I check, it's very sure SPAM being worth 4-5 points. That might be because my main language is german, and most SPAM is english, though. Jane made a good statement about writing rules to make a peak around 5.0, to clearly indicate SPAM or HAM. Sounds reasonable, but I didn't test it, because I don't happen to have any FPs. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpwQEmOnfljO.pgp Description: PGP signature
Re: Those Re: good obfupills spams (bayes scores)
On Montag, 1. Mai 2006 17:51 Matt Kettler wrote: Looking at my own current real-world maillogs, BAYES_99 matched 6,643 messages last week. Of those, only 24 had total scores under 9.0. (with BAYES_99 scoring 3.5, it would take a message with a total score of less than 8.5 to drop below the threshold of 5.0 if BAYES_99 were omitted entirely). I've looked at a snap of 424 spams these last days, with a total of 8519 points, making about 20 points per SPAM (average). 67 SPAMs are 5-9.99 points, 62 are 10-14.99 points, 294 are 15. So it's those 67 SPAMs that should worry me most - some of them are really just 5 (2 times 5.06 points), and I would like them to score higher, because that's more on the safe side. Unfortunately, I don't have the possibility to check which rules were hit, amavisd-new doesn't log that. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgp9gBSLwtFX1.pgp Description: PGP signature
Re: Those Re: good obfupills spams (bayes scores)
Incidentally, the FAQ answer for HowScoresAreAssigned on the SA wiki is out of date.
Re: Those Re: good obfupills spams
From: Michael Monnerie [EMAIL PROTECTED] Jane made a good statement about writing rules to make a peak around 5.0, to clearly indicate SPAM or HAM. Sounds reasonable, but I didn't test it, because I don't happen to have any FPs. Actually it's Joanne not Jane. {^_-} And the point I made is to keep the region right around 5.0 as swept clean of ambiguous cases as it's possible to maintain. It MAY be that the reliability of a rule should govern its score upon use. And scores should have a sprinkling of negative scores as well as mostly positive scores. It seems like Kalman filter approaches might do some real good. In fact a REAL Kalman filter that trains on feedback the way Bayes trains on feedback might produce some really interesting results as well as weed out rules that seem to amount to little or nothing at the present time. There was somebody here who did discuss a dynamic scoring engine approach. I wonder how far he got with it. His initial report sounded quite promising. And it's an ideal setting for Kalman sort of techniques. This rule is good for condition A, C, and D but not B... I do really like the idea of creating a dead zone that has neither ham or spam in it right around a score of 5 with separate peaks for ham and spam on either side of that empty zone. It may be hard to force that kind of selection without some fancy processing, though. {^_^}
Re: Those Re: good obfupills spams (bayes scores)
From: Michael Monnerie [EMAIL PROTECTED] 67 SPAMs are 5-9.99 points, OK, for a record with regards to spam and ham I have had four come through between 5 and 7.99 points out of about 1600 messages in my personal mail buckets. Two were from always-on which I signed up for when Powell the Younger was the FCC commissioner pushing BPL. As a ham radio operator I had a rather strong interest in opposition to this critter. I more or less abandoned the account and let the Tony Perkins email fall into the spam box. I finally got motivated to remove that today. One other was from a mailing list some dweeb spammed the list saying he could not read some other dweeb's base64 email. It was marginal. But it being marked as spam gave me a chance to send a private email jab back to the first dweeb about his message being spam. That leaves one real spam and no hams in the 5.0 to 7.99 wasteland. I have five messages between 8.0 and 10 inclusive. One is from my local congressman. I figure if I include his junk phone calls in my phone spam complaints (to him) the email should also be spam. I doubt I'll white list him. He and I don't agree much. I am much too libertarian for his Republican stance. If he'd start lecturing about people being responsible for themselves and their own actions I might be moved to white list him. But that's neither here nor there. The wasteland concept is working. And during this period no real ham has gotten a BAYES_99 rule hit. But the sample's still a little small to say anything solid about the 0.5% theoretical false alarm ratio, yet - maybe - if I stretch it a little. {^_^} - Joanne does ramble sometimes, doesn't she?
Re: Those Re: good obfupills spams (bayes scores)
From: jdow [EMAIL PROTECTED] One is from my local congressman. I figure if I include his junk phone calls in my phone spam complaints (to him) the email should also be spam. I doubt I'll white list him. He and I don't agree much. I am much too libertarian for his Republican stance. If he'd start lecturing about people being responsible for themselves and their own actions I might be moved to white list him. But that's neither here nor there. The wasteland concept is working. This earns a follow-up. I checked the Bayes score on his message. I must conclude that Bayes is pretty accurate. Since I consider virtually anybody in office today to be a spamming gasbag having his message hit a perfect 1. Bayes score is just too perfect. My faith in Bayes is increased appropriately. {^_-}
RE: Those Re: good obfupills spams
Matt Kettler wrote: It is perfectly reasonable to assume that most of the mail matching BAYES_99 also matches a large number of the stock spam rules that SA comes with. These highly-obvious mails are the model after which most SA rules are made in the first place. Thus, these mails need less score boost, as they already have a lot of score from other rules in the ruleset. However, mails matching BAYES_95 are more likely to be trickier, and are likely to match fewer other rules. These messages are more likely to require an extra boost from BAYES_95's score than those which match BAYES_99. I can't argue with this description, but I don't agree with the conclusion on the scores. The Bayes rules are not individual unrelated rules. Bayes is a series of rules indicating a range of probability that a message is spam or ham. You can argue over the exact scoring, but I can't see any reason to score BAYES_99 lower than BAYES_95. Since a BAYES_99 message is even more likely to be spam than a BAYES_95 message, it should have at least a slightly higher score. It is obvious that a BAYES_99 message is more likely to hit other rules and therefore be less reliant on a score increase from Bayes, but this is no reason to drop the score. I generally don't look into the rule scoring too much unless I run into a problem, but I thought this had been fixed in the latest couple of versions anyway. Looking at my score file, I find this: score BAYES_00 0.0001 0.0001 -2.312 -2.599 score BAYES_05 0.0001 0.0001 -1.110 -1.110 score BAYES_20 0.0001 0.0001 -0.740 -0.740 score BAYES_40 0.0001 0.0001 -0.185 -0.185 score BAYES_50 0.0001 0.0001 0.001 0.001 score BAYES_60 0.0001 0.0001 1.0 1.0 score BAYES_80 0.0001 0.0001 2.0 2.0 score BAYES_95 0.0001 0.0001 3.0 3.0 score BAYES_99 0.0001 0.0001 3.5 3.5 The scores march upwards just as expected. And it looks like the 50-99 scores have been set by hand rather than the perceptron. -- Bowie
RE: Those Re: good obfupills spams (bayes scores)
jdow wrote: From: Bart Schaefer [EMAIL PROTECTED] On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. My recollection is that there was speculation that the BAYES_9x rules were scored too low not because they FP'd in conjunction with other rules, but because against the corpus they TRUE P'd in conjunction with lots of other rules, and that it therefore wasn't necessary for the perceptron to assign a high score to BAYES_9x in order to push the total over the 5.0 threshold. The trouble with that is that users expect training on their personal spam flow to have a more significant effect on the scoring. I want to train bayes to compensate for the LACK of other rules matching, not just to give a final nudge when a bunch of others already hit. I filed a bugzilla some while ago suggesting that the bayes percentage ought to be used to select a rule set, not to adjust the score as a component of a rule set. There is one other gotcha. I bet vastly different scores are warranted for Bayes when run with per user training and rules as compared to global training and rules. Ack! I missed the subject change on this thread prior to my last reply. Sorry about the duplication. I think it is also a matter of manual training vs autolearning. A Bayes database that is consistently trained manually will be more accurate and can support higher scores. -- Bowie
Re: Those Re: good obfupills spams (bayes scores)
Bowie Bailey wrote: Matt Kettler wrote: It is perfectly reasonable to assume that most of the mail matching BAYES_99 also matches a large number of the stock spam rules that SA comes with. These highly-obvious mails are the model after which most SA rules are made in the first place. Thus, these mails need less score boost, as they already have a lot of score from other rules in the ruleset. However, mails matching BAYES_95 are more likely to be trickier, and are likely to match fewer other rules. These messages are more likely to require an extra boost from BAYES_95's score than those which match BAYES_99. I can't argue with this description, but I don't agree with the conclusion on the scores. The Bayes rules are not individual unrelated rules. Bayes is a series of rules indicating a range of probability that a message is spam or ham. You can argue over the exact scoring, but I can't see any reason to score BAYES_99 lower than BAYES_95. Since a BAYES_99 message is even more likely to be spam than a BAYES_95 message, it should have at least a slightly higher score. No, it should not. I've given a conclusive reason why it may not always be higher. My reason has a solid statistical reason behind it. This reasoning is supported by real-world testing and real-world data. You've given your opinion to the contrary, but no facts to support it other than declaring the rules to be related, and therefore the score should correlate with the bayes-calculated probability of spam. While I don't disagree with you that BAYES_99 scoring lower than BAYES_95 is counter-intuitive. I do not believe intuition alone is a reason to defy reality. If there are other rules with better performance (ie: fewer FPs) that consistently coincide with the hits of BAYES_99, those rules should soak up the lions share of the score. However, if there are a lot of spam messages with no other rules hit, BAYES_99 should get a strong boost from those. The perceptron results show that the former is largely true. BAYES_99 is mostly redundant. To back it up, I'm going to verify it with my own maillog data. Looking at my own current real-world maillogs, BAYES_99 matched 6,643 messages last week. Of those, only 24 had total scores under 9.0. (with BAYES_99 scoring 3.5, it would take a message with a total score of less than 8.5 to drop below the threshold of 5.0 if BAYES_99 were omitted entirely). So less than 0.37% of BAYES_99's hits actually mattered on my system last week. BAYES_95 on the other hand hit 468 messages, 20 of which scored less than 9.0. That's 4.2% of messages with BAYES_95 hits. A considerably larger percentage. Bringing it down to 8.0 to compensate for the score difference and you still get 17 messages, which is still a much larger 3.6% of it's hits. On my system, BAYES_95 is significant in pushing mail over the spam threshold 10 times more often than BAYES_99 is. What are your results? These are the greps I used, based on MailScanner log formats. Should work for spamd users, perhaps with slight modifications. zgrep BAYES_99 maillog.1.gz |wc -l zgrep BAYES_99 maillog.1.gz |grep -v score=[1-9][0-9]\. | grep -v score=9\. |wc -l
RE: Those Re: good obfupills spams (bayes scores)
Matt Kettler wrote: Bowie Bailey wrote: The Bayes rules are not individual unrelated rules. Bayes is a series of rules indicating a range of probability that a message is spam or ham. You can argue over the exact scoring, but I can't see any reason to score BAYES_99 lower than BAYES_95. Since a BAYES_99 message is even more likely to be spam than a BAYES_95 message, it should have at least a slightly higher score. No, it should not. I've given a conclusive reason why it may not always be higher. My reason has a solid statistical reason behind it. This reasoning is supported by real-world testing and real-world data. You've given your opinion to the contrary, but no facts to support it other than declaring the rules to be related, and therefore the score should correlate with the bayes-calculated probability of spam. While I don't disagree with you that BAYES_99 scoring lower than BAYES_95 is counter-intuitive. I do not believe intuition alone is a reason to defy reality. If there are other rules with better performance (ie: fewer FPs) that consistently coincide with the hits of BAYES_99, those rules should soak up the lions share of the score. However, if there are a lot of spam messages with no other rules hit, BAYES_99 should get a strong boost from those. The perceptron results show that the former is largely true. BAYES_99 is mostly redundant. To back it up, I'm going to verify it with my own maillog data. Looking at my own current real-world maillogs, BAYES_99 matched 6,643 messages last week. Of those, only 24 had total scores under 9.0. (with BAYES_99 scoring 3.5, it would take a message with a total score of less than 8.5 to drop below the threshold of 5.0 if BAYES_99 were omitted entirely). So less than 0.37% of BAYES_99's hits actually mattered on my system last week. BAYES_95 on the other hand hit 468 messages, 20 of which scored less than 9.0. That's 4.2% of messages with BAYES_95 hits. A considerably larger percentage. Bringing it down to 8.0 to compensate for the score difference and you still get 17 messages, which is still a much larger 3.6% of it's hits. On my system, BAYES_95 is significant in pushing mail over the spam threshold 10 times more often than BAYES_99 is. What are your results? These are the greps I used, based on MailScanner log formats. Should work for spamd users, perhaps with slight modifications. zgrep BAYES_99 maillog.1.gz |wc -l zgrep BAYES_99 maillog.1.gz |grep -v score=[1-9][0-9]\. | grep -v score=9\. | wc -l I think we are arguing from slightly different viewpoints. You are saying that higher scores are not needed since the lower score is made up for by other rules. I have 13,935 hits for BAYES_99. 412 of them are lower than 9.0. This seems to be caused by either AWL hits lowering the score or very few other rules hitting. BAYES_95 hit 469 times with 18 hits lower than 9.0. This means that, for me, BAYES_95 is significant slightly more often, percentage-wise, than BAYES_99. But considering volume, I would say that BAYES_99 is the more useful rule. However, that's not what I was arguing about to begin with. Because of the way the Bayes algorhytm works, I should be able to have more confidence in a BAYES_99 hit than a BAYES_95 hit. Therefore, it should have a higher score. Otherwise, you get the very strange occurance that if you train Bayes too well and the spams go from BAYES_95 to BAYES_99, the SA score actually goes down. The better you train your Bayes database, the more confidence it should have in picking out the spams. As the scoring moves from BAYES_50 up to BAYES_99, the SA score should increase to reflect the higher confidence level of the Bayes engine. -- Bowie
Re: Those Re: good obfupills spams (bayes scores)
From: Bowie Bailey [EMAIL PROTECTED] jdow wrote: From: Bart Schaefer [EMAIL PROTECTED] On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. My recollection is that there was speculation that the BAYES_9x rules were scored too low not because they FP'd in conjunction with other rules, but because against the corpus they TRUE P'd in conjunction with lots of other rules, and that it therefore wasn't necessary for the perceptron to assign a high score to BAYES_9x in order to push the total over the 5.0 threshold. The trouble with that is that users expect training on their personal spam flow to have a more significant effect on the scoring. I want to train bayes to compensate for the LACK of other rules matching, not just to give a final nudge when a bunch of others already hit. I filed a bugzilla some while ago suggesting that the bayes percentage ought to be used to select a rule set, not to adjust the score as a component of a rule set. There is one other gotcha. I bet vastly different scores are warranted for Bayes when run with per user training and rules as compared to global training and rules. Ack! I missed the subject change on this thread prior to my last reply. Sorry about the duplication. I think it is also a matter of manual training vs autolearning. A Bayes database that is consistently trained manually will be more accurate and can support higher scores. That may be a factor, too, Bowie. But, as igor is experiencing, the site Bayes faces a singular problem in that one person's ham is another person's extreme spam. When no two people can agree on what spam is and what ham is a global Bayes becomes (relatively) ineffective very quickly. This is why I included that afterthought which probably should have been highlighted up front. {^_^}
Re: Those Re: good obfupills spams (bayes scores)
From: Matt Kettler [EMAIL PROTECTED] Bowie Bailey wrote: Matt Kettler wrote: It is perfectly reasonable to assume that most of the mail matching BAYES_99 also matches a large number of the stock spam rules that SA comes with. These highly-obvious mails are the model after which most SA rules are made in the first place. Thus, these mails need less score boost, as they already have a lot of score from other rules in the ruleset. However, mails matching BAYES_95 are more likely to be trickier, and are likely to match fewer other rules. These messages are more likely to require an extra boost from BAYES_95's score than those which match BAYES_99. I can't argue with this description, but I don't agree with the conclusion on the scores. The Bayes rules are not individual unrelated rules. Bayes is a series of rules indicating a range of probability that a message is spam or ham. You can argue over the exact scoring, but I can't see any reason to score BAYES_99 lower than BAYES_95. Since a BAYES_99 message is even more likely to be spam than a BAYES_95 message, it should have at least a slightly higher score. No, it should not. I've given a conclusive reason why it may not always be higher. My reason has a solid statistical reason behind it. This reasoning is supported by real-world testing and real-world data. You've given your opinion to the contrary, but no facts to support it other than declaring the rules to be related, and therefore the score should correlate with the bayes-calculated probability of spam. While I don't disagree with you that BAYES_99 scoring lower than BAYES_95 is counter-intuitive. I do not believe intuition alone is a reason to defy reality. Matt, as much as I respect you, which is a heck of a lot, I must insist that your assertion is correct within a model that does not fit the real needs of the situation, PARTICULARLY for individual Bayes databases that are not fed carelessly. You don't want to crowd just above 5. You want to have a score gap around five with almost all spam scoring well above 10. Now, I have managed to almost sweep that region clean, about 1 or 2% of my spam falls between 5 and 8. Another 4% falls under 10. This makes sweeping the spam directory for ham quite easily. (It also serves as a wry note that some of the magazines to which I subscribe also spam me. It's high nift that their spams are tagged and their hams are not, mostly. When they are tagged they're still not BAYES_9x, though.) If there are other rules with better performance (ie: fewer FPs) that consistently coincide with the hits of BAYES_99, those rules should soak up the lions share of the score. However, if there are a lot of spam messages with no other rules hit, BAYES_99 should get a strong boost from those. If there are any significant number of spams that hit ONLY BAYES_99 then BAYES_99 should either very nearly kick them over or actually kick them over. That said I have found that clever meta rules regarding specific sources and the BAYES scores have allowed me to widen my wasteland of scores between 4 and 10 lately. This may be an important trick to employ. The perceptron results show that the former is largely true. BAYES_99 is mostly redundant. To back it up, I'm going to verify it with my own maillog data. Looking at my own current real-world maillogs, BAYES_99 matched 6,643 messages last week. Of those, only 24 had total scores under 9.0. (with BAYES_99 scoring 3.5, it would take a message with a total score of less than 8.5 to drop below the threshold of 5.0 if BAYES_99 were omitted entirely). So less than 0.37% of BAYES_99's hits actually mattered on my system last week. I wish I had that luck. And I have over 40 rule sets in action plus a large bunch of my own. BAYES_95 on the other hand hit 468 messages, 20 of which scored less than 9.0. That's 4.2% of messages with BAYES_95 hits. A considerably larger percentage. Bringing it down to 8.0 to compensate for the score difference and you still get 17 messages, which is still a much larger 3.6% of it's hits. On my system, BAYES_95 is significant in pushing mail over the spam threshold 10 times more often than BAYES_99 is. What are your results? I don't have a script that tells me what BAYES_99 hits on singularly. I posted what ratio of ham and spam BAYES_99 and BAYES_00 hit on the last 10 weeks. What I do NOT see is any benefit from trying to crowd close to 5 points. This is the reason I see the model itself as being broken. When I ran with the original BAYES scores on 3.04 the system leaked like a seive. As I upped the score the missed spams decreased. But every once and awhile I seem to hit a lead position on a round of innovatvie spams which hit nothing but BAYES_99. Loren responds by writing rules to catch them. I respond by increasing Bayes. I figure 5.0 is my limit, though. Although I figure a good ratio for mismarked ham to mismarked spam is about 0.1:1. When I get that bad I make a new meta rule or
Re: Those Re: good obfupills spams
jdow wrote: And it is scored LESS than BAYES_95 by default. That's a clear signal that the theory behind the scoring system is a little skewed and needs some rethinking. No.. It does not mean there's a problem with the scoring system. It means you're trying to apply a simple linear model to something which is inherently not linear, nor simple. This is a VERY common misconception. Please bear with me for a minute as I explain some things. This is more-or-less the same misconception as expecting rules with higher S/O's to always score higher than those with lower S/O's. Generally this is true, but there's more to consider that can cause the opposite to be true. The score of a rule in SA is not a function of the performance of that one rule, nor should it be. The score of a SA rule is a function of what combinations of rules it matches in conjunction with. This creates a real world fit of a complex set of rules against real-world behavior. This complex interaction between rules results in most of the problems people see. People inherently expect simple linearity. However, consider that SA scoring is a function of several hundred variable equation attempting to perform an approximation of optimal fit to a sampling of human behavior. Why, based on that, would you ever expect the score two of those hundreds of variables to be linear as a function of spam hit rate? It is perfectly reasonable to assume that most of the mail matching BAYES_99 also matches a large number of the stock spam rules that SA comes with. These highly-obvious mails are the model after which most SA rules are made in the first place. Thus, these mails need less score boost, as they already have a lot of score from other rules in the ruleset. However, mails matching BAYES_95 are more likely to be trickier, and are likely to match fewer other rules. These messages are more likely to require an extra boost from BAYES_95's score than those which match BAYES_99.
Re: Those Re: good obfupills spams
From: Matt Kettler [EMAIL PROTECTED] jdow wrote: And it is scored LESS than BAYES_95 by default. That's a clear signal that the theory behind the scoring system is a little skewed and needs some rethinking. No.. It does not mean there's a problem with the scoring system. It means you're trying to apply a simple linear model to something which is inherently not linear, nor simple. This is a VERY common misconception. Please bear with me for a minute as I explain some things. This is more-or-less the same misconception as expecting rules with higher S/O's to always score higher than those with lower S/O's. Generally this is true, but there's more to consider that can cause the opposite to be true. The score of a rule in SA is not a function of the performance of that one rule, nor should it be. The score of a SA rule is a function of what combinations of rules it matches in conjunction with. This creates a real world fit of a complex set of rules against real-world behavior. This complex interaction between rules results in most of the problems people see. People inherently expect simple linearity. However, consider that SA scoring is a function of several hundred variable equation attempting to perform an approximation of optimal fit to a sampling of human behavior. Why, based on that, would you ever expect the score two of those hundreds of variables to be linear as a function of spam hit rate? It is perfectly reasonable to assume that most of the mail matching BAYES_99 also matches a large number of the stock spam rules that SA comes with. These highly-obvious mails are the model after which most SA rules are made in the first place. Thus, these mails need less score boost, as they already have a lot of score from other rules in the ruleset. However, mails matching BAYES_95 are more likely to be trickier, and are likely to match fewer other rules. These messages are more likely to require an extra boost from BAYES_95's score than those which match BAYES_99. Matt, I understand the model. I believe it is the wrong model to apply. Experience indicates this is very much the case. And I must remind you that an ounce of actual experience is worth a neutron star worth of theory. When I raise the score of BAYES_99 and 95 to be monotonically increasing with 99 at or very near to 5.0 I demonstrably get far fewer escaped spams at a cost of VERY few (low enough to be unnoticed) caught hams. When experience disagrees with the model some extra thought is required with regards to the model. As far as I can see the perceptron does not handle single factors that are exceptionally good at catching spam with exceptionally few false alarms AND is often the ONLY marker for actual spam that is caught. This latter is very often the case here with regards to BAYES_99. (The logged hams caught as spam are escaped spams or else cases that are impossible to catch correctly without complex meta rules, such as LKML or other technical code, patch, and diff bearing mailing lists that also do not adequately filter being relayed through. For these lists I have actually had to artificially rescore all the BAYES scores using meta rules. I am fine tuning these alterations at the moment. I've had some spams escape. My OWN number of mismarked hams has become vanishingly small. Loren does not have these rules yet. If he wants 'em I'll give them to him quickly.) Note the goodness of BAYES_99 here - stats including me and Loren over 80,000 messages total. 1BAYES_99 20156 4.88 25.08 91.610.07 1BAYES_00 4610715.54 57.360.07 78.98 The BAYES_99's *I* have seen on ham are running exclusively to spams that managed to fire a negative scoring rule for mailing lists. LKML and FreeBSD are the two lists so affected. Now, in the last two days I have had some ham come in as spam, not due to BAYES_9x at all. It was a political discussion that happened to trigger a lot of the mortgage spam rules. Cain't do much about that! (At least not without giving Yahoo Groups an utterly unwarranted negative score.) Based on *MY* experience the perceptron performance model was not the appropriate model to choose. {^_^}
Re: Those Re: good obfupills spams
From: Matt Kettler [EMAIL PROTECTED] jdow wrote: And it is scored LESS than BAYES_95 by default. That's a clear signal that the theory behind the scoring system is a little skewed and needs some rethinking. No.. It does not mean there's a problem with the scoring system. It means you're trying to apply a simple linear model to something which is inherently not linear, nor simple. This is a VERY common misconception. I have a few more thoughts that are probably more constructive than merely saying that the perceptron model is obviously wrong where the rubber meets the road. It seems to me that the observed operation of the perceptron is driving scores towards the minimum amount over 5.0 that can be managed and still capture most of the spam. I've been operating here on a slightly different principle, at least for my own rules. I work to drive scores away from 5.0, in both directions as needed. If I see a low scoring captured spam being always scored greater than 8 or 10 I am pleased. When I see items in the 5 to 10 range I figure out what I can do to drive it to the correct direction, ham or spam. (Bayes is usually my choice of action. I usually discover another email that has a mid level Bayes score rather than an extreme level. And I wish I could codify how I choose to feed Bayes. I feed it almost on an intuitive level, This is Bayes food or Bayes already has a lot of this food and is obviously a little confused for my mail mix. That's hardly a good rule for feeding that I can pass on to people. sigh) So rather than having perceptron try to push towards a relatively smooth curve of all scores it should work to push the overall score profile into what one wag in an SF story called a brassiere curve, which is wonderfully descriptive when you think of some of the 50's and 60's fashions. {^_-} If it can create a viable valley with very few messages scoring near 5.0 and as wide a variance between the ham peak and the spam peak it may act better. THAT said, I note that I use meta rules regularly to generate some modest negative scores as well as positive scores. This has had some good side effects on the reliability of scoring here. I've noticed that a small few of the SARE rules, over time, decayed into being fairly good indications of ham rather than spam. Since SARE is more agile than the basic SA rule sets it might be good if the SARE people took this as a tool for choke lift and separation on the ham and spam peaks. It might be interesting to notice if the obverse of in this BL is a decent indication of not spam and give that a modest bit of negative score for some cases. I just pulled RATWR10a_MESSID because it was hitting 13% of ham and 4% of spam, for example. Perhaps I should have given it a very small negative score instead. I note right now that SPF_PASS seems to hit 50% (!) of ham and only 4% of spam. Perhaps it, too, should have a slight negative score to help increase the span between the ham peak and the spam peak. It does seem clear to me that the objective is not to create minimum score to mark as spam so much as to create as large a separation between typical ham and spam scores as possible. The more reliable rules should have higher negative and positive scores as appropriate. And of course, the final caveat, is that I am running a two person install of SpamAssassin with per user rules and scores with two fairly intelligent (although some people question that about me) people running their own user rules and Bayes. I also do not use automatic anything. I cannot get over the idea that automatic whitelist and automatic learning are not necessarily stable concepts UNTIL you have a very reliable BAYES setup and set of rules from manual training. I have that and still cannot convince myself to fix what isn't broken. {^_^} Joanne
Re: Those Re: good obfupills spams
jdow wrote: BAYES_99, by definition, has a 1% false positive rate. That is what Bayes thinks. I think it is closer to something between 0.5% and 0.1% false positive. I have mine trained down lethally fine at this point, it appears. Ok.. Fine, let's take 0.1% FP rate, 10x better than theoretical, but still realistic at some sites.. Even still.. Is that low enough to be worth assigning 5.0 points to? No.
Re: Those Re: good obfupills spams
From: Matt Kettler [EMAIL PROTECTED] jdow wrote: BAYES_99, by definition, has a 1% false positive rate. That is what Bayes thinks. I think it is closer to something between 0.5% and 0.1% false positive. I have mine trained down lethally fine at this point, it appears. Ok.. Fine, let's take 0.1% FP rate, 10x better than theoretical, but still realistic at some sites.. Even still.. Is that low enough to be worth assigning 5.0 points to? No. So far, however, it has been worth 5.0 points. I've had it (actually) false positive maybe once in the last month. I've had SA mismark some BAYES_99 spam, however. The spam had other characteristics that earned a slight negative score. (I've since developed some meta rules that are reducing this. It the email is from a mailing list I know I give a modest negative score. Then if the Bayes is high or very high I award some positive points. High plus mailing list is about 2 points with mailing list being -1.5. Very high adds another 2 points. That second two points MAY have to be fine tuned upwards.) {^_^}
Re: Those Re: good obfupills spams
This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? The danger with tweaking standard rule scores you probably already know: you are at least theoretically likely to get more false positives, because the score set was optimized for the original scores. Of course, everyone tweaks a few scores at least. After all, that is why they are tweakable. As long as you watch you spam bucket for FPs you can go pretty high on things. Looking at today's spam I only see one of these, but it scored around 30. I have a bunch of the Re: news kind that all scored 35-39. Loren
Re: Those Re: good obfupills spams
From: Loren Wilton [EMAIL PROTECTED] This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? The danger with tweaking standard rule scores you probably already know: you are at least theoretically likely to get more false positives, because the score set was optimized for the original scores. Of course, everyone tweaks a few scores at least. After all, that is why they are tweakable. As long as you watch you spam bucket for FPs you can go pretty high on things. Looking at today's spam I only see one of these, but it scored around 30. I have a bunch of the Re: news kind that all scored 35-39. Loren And most of those which are not black lists are from 88_FVGT_body.cf. {^_^}Joanne
Re: Those Re: good obfupills spams
... Matt Kettler replied: John Tice wrote: Greetings, This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the questionwhat am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimizedwhat am I not getting? BAYES_99, by definition, has a 1% false positive rate. Matt, If we were to presume a uniform distribution between a estimate of 99% and 100%, then the FP rate would be .5%, not 1%. And for large sites (i.e. 10s or thousands or messages a day or more), this may be what occurs; But what I see and what I assume many other small sites see is a very much non-uniform distribution; From the last 30 hours, the average estimate (re. the value reported in the bayes=xxx clause) for spam hitting the BAYES_99 rule is .41898013269 with about two thirds of them reporting bayes=1 and a lowest value of bayes=0.998721756590216. While SA is quite robust largely because of the design feature that no single reason/cause/rule should by itself mark a message as spam, I have to guess that the FP rate that the majority of users see for BAYES_99 is far below 1%. From the estimators reported above, I would expect that I would have seen a .003% FP rate for the last day plus a little, if only I received 100,000 or so spam messages to have been able to see it:). I don't change the scoring from the defaults, but if people were to want to, maybe they could change the rules (or add a rule) for BAYES_99_99 which would take only scores higher than bayes=. and which (again with a uniform distribution) have an expected FP rate of .005% - than re-score that just closer (but still less) than the spam threshold, or add a point of fraction thereof to raise the score to just under the spam threshhold (adding a new rule would avoid having to edit distributed files and thus would probably be the better method). Anyway, to better address the OP's questions: The system is more robust if instead of changing the weighting of existing rules (assuming that they were correctly established to begin with), you add more possible inputs (and preferably independant ones - i.e. where the FPs between rules have a low correlation). Simply increasing scores will improve your spam capture rate, just as decreasing the spam threshold will - but both methods will add to the likelyhood of false positives; Look into the distributed documentation to see the expected FP rates at different spam threshold levels for numbers to drive this point home (and changing specific rules' scores is just like changing the threshold, but in a non-uniform fashion - unless you actually measure the values for your own site's mail and recompute numbers that are a better estimate for local traffic). Paul Shupak [EMAIL PROTECTED]
Re: Those Re: good obfupills spams
List Mail User wrote: ... Matt Kettler replied: John Tice wrote: Greetings, This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? BAYES_99, by definition, has a 1% false positive rate. Matt, If we were to presume a uniform distribution between a estimate of 99% and 100%, then the FP rate would be .5%, not 1%. You're right Paul, my bad.. But again, I don't care if it's 0.01%. The question here is is jacking up the score of BAYES_99 to be greater than required_hits a good idea. The answer is No, because BAYES_99 is NOT a 100% accurate test. By definition it does have a non-zero FP rate. And for large sites (i.e. 10s or thousands or messages a day or more), this may be what occurs; But what I see and what I assume many other small sites see is a very much non-uniform distribution; From the last 30 hours, the average estimate (re. the value reported in the bayes=xxx clause) for spam hitting the BAYES_99 rule is .41898013269 with about two thirds of them reporting bayes=1 and a lowest value of bayes=0.998721756590216. Yes, that's to be expected with Chi-Squared combining. While SA is quite robust largely because of the design feature that no single reason/cause/rule should by itself mark a message as spam, I have to guess that the FP rate that the majority of users see for BAYES_99 is far below 1%. From the estimators reported above, I would expect that I would have seen a .003% FP rate for the last day plus a little, if only I received 100,000 or so spam messages to have been able to see it:). True, but it's still not nearly zero. Even in the corpus testing, which is run by the best of the best in SA administration and maintenance, BAYES_99 matched 0.0396% of ham, or 21 out of 53,091 hams. (Based on set-3 of SA 3.1.0) Given we are dealing with user who doesn't even understand why you might not want this set high enough, I would expect the level of sophistication in bayes maintenance Besides.. If you want to make a mathematics based argument against me, start by explaining how the perceptron mathematically is flawed. It assigned the original score based on real-world data. Not our vast over simplifications. You should have good reason to question its design before second guessing it's scoring based on speculation such as this. I don't change the scoring from the defaults, but if people were to want to, maybe they could change the rules (or add a rule) for BAYES_99_99 which would take only scores higher than bayes=. and which (again with a uniform distribution) have an expected FP rate of .005% - than re-score that just closer (but still less) than the spam threshold, I'd agree.. However, the OP has already made BAYES_99 required_hits. Bad idea. Period.
Re: Those Re: good obfupills spams
Thank you all for the comments. My personal experience is that Bayes_99 is amazingly reliable––close to 100% for me. I formerly had it set to 4.5 so that bayes_99 plus one other hit would flag it, but then I started getting some spam that were not hit by any other rule, yet bayes correctly identified them. It seems more effective to write some negative scoring ham rules specific to my important content rather than to take less than full advantage of the high accuracy of bayes. And, the spams in question in this thread are hitting multiple rules, so should be catchable without having bayes_99 set over the top. I suppose all these judgments must take into account one's preferences, degree of aversion to FPs, and the diversity of content you're working with. Hopefully I will improve accuracy by writing/ adding custom rules and be able to back off the scoring of standard rules, but I have been fairly successful (by my own definition) at tweaking standard rules with minimal FPs. At times when I do get a FP I take a look at it and think this one just deserves to get filtered. I'm willing to accept a certain amount, or a certain type, in order to be aggressive against spam. Before I only had access to user_prefs, but now that I have a server with root access it's a brand new ball game. The mechanics are easy enough, but I need to work on the broader strategies. Any particularly good reading to be recommended? John On Apr 29, 2006, at 8:12 AM, List Mail User wrote: ... Matt Kettler replied: John Tice wrote: Greetings, This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? BAYES_99, by definition, has a 1% false positive rate. Matt, If we were to presume a uniform distribution between a estimate of 99% and 100%, then the FP rate would be .5%, not 1%. And for large sites (i.e. 10s or thousands or messages a day or more), this may be what occurs; But what I see and what I assume many other small sites see is a very much non-uniform distribution; From the last 30 hours, the average estimate (re. the value reported in the bayes=xxx clause) for spam hitting the BAYES_99 rule is .41898013269 with about two thirds of them reporting bayes=1 and a lowest value of bayes=0.998721756590216. While SA is quite robust largely because of the design feature that no single reason/cause/rule should by itself mark a message as spam, I have to guess that the FP rate that the majority of users see for BAYES_99 is far below 1%. From the estimators reported above, I would expect that I would have seen a .003% FP rate for the last day plus a little, if only I received 100,000 or so spam messages to have been able to see it:). I don't change the scoring from the defaults, but if people were to want to, maybe they could change the rules (or add a rule) for BAYES_99_99 which would take only scores higher than bayes=. and which (again with a uniform distribution) have an expected FP rate of .005% - than re- score that just closer (but still less) than the spam threshold, or add a point of fraction thereof to raise the score to just under the spam threshhold (adding a new rule would avoid having to edit distributed files and thus would probably be the better method). Anyway, to better address the OP's questions: The system is more robust if instead of changing the weighting of existing rules (assuming that they were correctly established to begin with), you add more possible inputs (and preferably independant ones - i.e. where the FPs between rules have a low correlation). Simply increasing scores will improve your spam capture rate, just as decreasing the spam threshold will - but both methods will add to the likelyhood of false positives; Look into the distributed documentation to see the expected FP rates at different spam threshold levels for numbers to drive this point home (and changing specific rules' scores is just like changing the threshold, but in a non-uniform fashion - unless you actually measure the values for your own site's mail and recompute numbers that are a better estimate for local traffic). Paul Shupak [EMAIL PROTECTED]
Re: Those Re: good obfupills spams
On 4/29/06, List Mail User [EMAIL PROTECTED] wrote: While SA is quite robust largely because of the design feature that no single reason/cause/rule should by itself mark a message as spam, I have to guess that the FP rate that the majority of users see for BAYES_99 is far below 1%. Anyway, to better address the OP's questions: The system is more robust if instead of changing the weighting of existing rules (assuming that they were correctly established to begin with), you add more possible inputs Exactly. For example, I find that anything in the subset consisting of messages that don't mention my email address anywhere in the To/Cc headers and also scoring above BAYES_70 has close to 100% likelyhood of being spam. However, since I also get quite a lot of mail that doesn't fall into that subset, I can't simply increase the scores for the BAYES rules. In this case I use procmail to examine the headers after SA has scored the message, but I've been considering creating a meta-rule of some kind. Trouble is, SA doesn't know what my email address means (it'd need to be a list of addresses), and I'm reluctant to turn on allow_user_rules.
Re: Those Re: good obfupills spams
On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: Besides.. If you want to make a mathematics based argument against me, start by explaining how the perceptron mathematically is flawed. It assigned the original score based on real-world data. Did it? I thought the BAYES_* scores have been fixed values for a while now, to force the perceptron to adapt the other scores to fit.
Re: Those Re: good obfupills spams (bayes scores)
Bart Schaefer wrote: On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: Besides.. If you want to make a mathematics based argument against me, start by explaining how the perceptron mathematically is flawed. It assigned the original score based on real-world data. Did it? I thought the BAYES_* scores have been fixed values for a while now, to force the perceptron to adapt the other scores to fit. Actually, you're right..I'm shocked and floored, but you're right. In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. This forces the perceptron to attempt to compensate. (Pollution always is a problem since nobody is perfect, but it occurs to differing degrees). -or- 2) The perceptron is out-of whack. (I highly doubt this because the perceptron generated the ones for 3.0.x and they were fine) -or- 3) The Real-world FPs of BAYES_99 really do tend to also be cascades with other rules in the 3.1.x ruleset, and the perceptron is correctly capping the score. This could differ from 3.0.x due to change in rules, or change in ham patterns over time. -or- 4) one of the corpus submitters has a poorly trained bayes db. (possible, but I doubt it) Looking at statistics-set3 for 3.0.x and 3.1.x there was a slight increase in ham-hits for BAYES_99 and a slight decrease in spam hits. 3.0.x: OVERALL% SPAM% HAM% S/ORANK SCORE NAME 43.515 89.3888 0.0335 1.000 0.83 1.89 BAYES_99 3.1.x: OVERALL% SPAM% HAM% S/ORANK SCORE NAME 60.712 86.7351 0.0396 1.000 0.90 3.50 BAYES_99 Also to consider is set3 of 3.0.x was much closer to a 50/50 mix of spam/nonspam (48.7/51.3) than 3.1.0 was (nearly 70/30)
Re: Those Re: good obfupills spams (bayes scores)
On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. My recollection is that there was speculation that the BAYES_9x rules were scored too low not because they FP'd in conjunction with other rules, but because against the corpus they TRUE P'd in conjunction with lots of other rules, and that it therefore wasn't necessary for the perceptron to assign a high score to BAYES_9x in order to push the total over the 5.0 threshold. The trouble with that is that users expect training on their personal spam flow to have a more significant effect on the scoring. I want to train bayes to compensate for the LACK of other rules matching, not just to give a final nudge when a bunch of others already hit. I filed a bugzilla some while ago suggesting that the bayes percentage ought to be used to select a rule set, not to adjust the score as a component of a rule set.
Re: Those Re: good obfupills spams
From: Matt Kettler [EMAIL PROTECTED] List Mail User wrote: Matt Kettler replied: John Tice wrote: Greetings, This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? BAYES_99, by definition, has a 1% false positive rate. If we were to presume a uniform distribution between a estimate of 99% and 100%, then the FP rate would be .5%, not 1%. You're right Paul, my bad.. But again, I don't care if it's 0.01%. The question here is is jacking up the score of BAYES_99 to be greater than required_hits a good idea. The answer is No, because BAYES_99 is NOT a 100% accurate test. By definition it does have a non-zero FP rate. I run AT 5.0. When I see my first false alarm solely from BAYES_99 I will reduce it slightly. I know what theory says. I also know that BAYES_99 alone captures more spam than it has ever captured ham for false imprisonment. And for large sites (i.e. 10s or thousands or messages a day or more), this may be what occurs; But what I see and what I assume many other small sites see is a very much non-uniform distribution; From the last 30 hours, the average estimate (re. the value reported in the bayes=xxx clause) for spam hitting the BAYES_99 rule is .41898013269 with about two thirds of them reporting bayes=1 and a lowest value of bayes=0.998721756590216. Yes, that's to be expected with Chi-Squared combining. While SA is quite robust largely because of the design feature that no single reason/cause/rule should by itself mark a message as spam, I have to guess that the FP rate that the majority of users see for BAYES_99 is far below 1%. From the estimators reported above, I would expect that I would have seen a .003% FP rate for the last day plus a little, if only I received 100,000 or so spam messages to have been able to see it:). True, but it's still not nearly zero. Even in the corpus testing, which is run by the best of the best in SA administration and maintenance, BAYES_99 matched 0.0396% of ham, or 21 out of 53,091 hams. (Based on set-3 of SA 3.1.0) And it is scored LESS than BAYES_95 by default. That's a clear signal that the theory behind the scoring system is a little skewed and needs some rethinking. Given we are dealing with user who doesn't even understand why you might not want this set high enough, I would expect the level of sophistication in bayes maintenance Besides.. If you want to make a mathematics based argument against me, start by explaining how the perceptron mathematically is flawed. It assigned the original score based on real-world data. Not our vast over simplifications. You should have good reason to question its design before second guessing it's scoring based on speculation such as this. When it can give BAYES_99 a score LOWER than BAYES_95 it clearly has a conceptual problem. (It also indicates that automatic Bayes filter training has its own conceptual flaws.) I don't change the scoring from the defaults, but if people were to want to, maybe they could change the rules (or add a rule) for BAYES_99_99 which would take only scores higher than bayes=. and which (again with a uniform distribution) have an expected FP rate of .005% - than re-score that just closer (but still less) than the spam threshold, I'd agree.. However, the OP has already made BAYES_99 required_hits. Bad idea. Period. 5.0 is, admittedly marginal. 6 or 7 is not a good idea. Not enough rules exist that will pull it back down. (Thinking on that I suspect there are some SARE rules that should lower the score slightly when they are not hit.) {^_^}
Re: Those Re: good obfupills spams (bayes scores)
From: Matt Kettler [EMAIL PROTECTED] Bart Schaefer wrote: On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: Besides.. If you want to make a mathematics based argument against me, start by explaining how the perceptron mathematically is flawed. It assigned the original score based on real-world data. Did it? I thought the BAYES_* scores have been fixed values for a while now, to force the perceptron to adapt the other scores to fit. Actually, you're right..I'm shocked and floored, but you're right. In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. This forces the perceptron to attempt to compensate. (Pollution always is a problem since nobody is perfect, but it occurs to differing degrees). -or- 2) The perceptron is out-of whack. (I highly doubt this because the perceptron generated the ones for 3.0.x and they were fine) -or- 3) The Real-world FPs of BAYES_99 really do tend to also be cascades with other rules in the 3.1.x ruleset, and the perceptron is correctly capping the score. This could differ from 3.0.x due to change in rules, or change in ham patterns over time. -or- 4) one of the corpus submitters has a poorly trained bayes db. (possible, but I doubt it) Looking at statistics-set3 for 3.0.x and 3.1.x there was a slight increase in ham-hits for BAYES_99 and a slight decrease in spam hits. 3.0.x: OVERALL% SPAM% HAM% S/ORANK SCORE NAME 43.515 89.3888 0.0335 1.000 0.83 1.89 BAYES_99 3.1.x: OVERALL% SPAM% HAM% S/ORANK SCORE NAME 60.712 86.7351 0.0396 1.000 0.90 3.50 BAYES_99 Also to consider is set3 of 3.0.x was much closer to a 50/50 mix of spam/nonspam (48.7/51.3) than 3.1.0 was (nearly 70/30) What happens comes from the basic reality that Bayes and the other rules are not orthogonal sets. So many other rules hit 95 and 99 that the perceptron artificially reduced the goodness rating for these rules. It needs some serious skewing to catch situations where 95 or 99 hit and very few other rules hit. Those are the times the accuracy of Bayes is needed the most. I've found, here, that 5.0 is a suitable score. I suspect if I were more realistic 4.9 would be closer. But I still do remember learning the score bias and being floored by it when I noticed 99 on some spams that leaked through with ONLY the 99 hit. I am speaking of dozens of spams hit that way. So far over several years I've found a few special cases that warrant negative rules. That seems to be pulling the 99 rule's false alarm rate down to I can't see it. (I have, however, been tempted to generate a BAYES_99p5 rule and a BAYES_99p9 rule to fine tune the scores up around 4.9 and 5.0.) {^_
Re: Those Re: good obfupills spams (bayes scores)
From: Bart Schaefer [EMAIL PROTECTED] On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. My recollection is that there was speculation that the BAYES_9x rules were scored too low not because they FP'd in conjunction with other rules, but because against the corpus they TRUE P'd in conjunction with lots of other rules, and that it therefore wasn't necessary for the perceptron to assign a high score to BAYES_9x in order to push the total over the 5.0 threshold. The trouble with that is that users expect training on their personal spam flow to have a more significant effect on the scoring. I want to train bayes to compensate for the LACK of other rules matching, not just to give a final nudge when a bunch of others already hit. I filed a bugzilla some while ago suggesting that the bayes percentage ought to be used to select a rule set, not to adjust the score as a component of a rule set. jdow There is one other gotcha. I bet vastly different scores are warranted for Bayes when run with per user training and rules as compared to global training and rules. {^_^}
Re: Those Re: good obfupills spams
| | They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone | aren't scored high enough to classify as spam, and I'm reluctant to | crank them up just for this. However, the number of spams getting | through SA has tripled in the last four days or so, from around 14 for | every thousand trapped, to around 40. | | I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far | they aren't having any useful effect. Other suggestions? I would make a subject Re: good rule that scores just high enough to push it to the spam level.
Re: Those Re: good obfupills spams
Bart Schaefer wrote: The largest number of spam messages currently getting through SA at my site are short text-only spams with subject Re: good followed by an obfuscated drug name (so badly mangled as to be unrecognizable in many cases). The body contains a gappy-text list of several other kinds of equally unreadable pharmaceuticals, a single URL which changes daily if not more often, and then several random words and a short excerpt from a novel. They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone aren't scored high enough to classify as spam, and I'm reluctant to crank them up just for this. However, the number of spams getting through SA has tripled in the last four days or so, from around 14 for every thousand trapped, to around 40. I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far they aren't having any useful effect. Other suggestions? The ReplaceTags plugin can be very useful for creating rules to match these. Let's say you get a message with text that looks like: S b P u A z M where the lower-case letters vary. A traditional rule might look like: /S [a-z] P [a-z] A [a-z] M/ Which is really not too bad. However, ReplaceTags allows you to create short hand. Something like: replace_tag WS ( [a-z] ) And your rule becomes: /SWSPWSAWSM/ For this to work, you'll also need to add your rule name to a replace_rules line. Using parentheses in your regex will create wasted captures so you'll probably want to use a different method to mark off the whitespace. You also might want to add a negative lookahead although in this case you probably wouldn't need it. For more on ReplaceTags: http://spamassassin.apache.org/full/3.1.x/dist/doc/Mail_SpamAssassin_Plugin_ReplaceTags.html -Stuart
Re: Those Re: good obfupills spams
On 4/28/06, [EMAIL PROTECTED] wrote: I would make a subject Re: good rule that scores just high enough to push it to the spam level. They're only scoring about 3.3, and I'm reluctant to make Re: good worth 2 points all by itself. That'd be worse than increasing the spamcop score. A meta rule, though ...
Re: Those Re: good obfupills spams
... Bart Schaefer wrote: The largest number of spam messages currently getting through SA at my site are short text-only spams with subject Re: good followed by an obfuscated drug name (so badly mangled as to be unrecognizable in many cases). The body contains a gappy-text list of several other kinds of equally unreadable pharmaceuticals, a single URL which changes daily if not more often, and then several random words and a short excerpt from a novel. They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone aren't scored high enough to classify as spam, and I'm reluctant to crank them up just for this. However, the number of spams getting through SA has tripled in the last four days or so, from around 14 for every thousand trapped, to around 40. I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far they aren't having any useful effect. Other suggestions? These few rules can help a lot (potentially with some possible FPs though). And as always, train your BAYES with the ones that get through and enable the digest tests (i.e. DCC, Pyzor and Razor). uridnsblURI_COMPLETEWHOIS combined-HIB.dnsiplists.completewhois.com. A bodyURI_COMPLETEWHOIS eval:check_uridnsbl('URI_COMPLETEWHOIS') describeURI_COMPLETEWHOIS URI in combined-HIB.dnsiplists.completewhois.com tflags URI_COMPLETEWHOIS net score URI_COMPLETEWHOIS 1.25 uridnsblURI_IN_SORBS_DNS_SPAM spam.dnsbl.sorbs.net. A bodyURI_IN_SORBS_DNS_SPAM eval:check_uridnsbl('URI_IN_SORBS_DNS_SPAM') describeURI_IN_SORBS_DNS_SPAM URI in spam.dnsbl.sorbs.net tflags URI_IN_SORBS_DNS_SPAM net score URI_IN_SORBS_DNS_SPAM 1.125 meta URI_M_SBL_COMWHOIS (URI_COMPLETEWHOIS URIBL_SBL) describe URI_M_SBL_COMWHOIS Both SBL and COMPLETEWHOIS score URI_M_SBL_COMWHOIS1.375 meta URI_M_SORBS_SPAM_SBL (URI_IN_SORBS_DNS_SPAM URIBL_SBL) describe URI_M_SORBS_SPAM_SBL Both SORBS SPAM and SBL score URI_M_SORBS_SPAM_SBL 0.5 meta URI_M_SORBS_SPAM_CWHO (URI_IN_SORBS_DNS_SPAM URI_COMPLETEWHOIS) describe URI_M_SORBS_SPAM_CWHO Both SORBS SPAM and CompleteWhois score URI_M_SORBS_SPAM_CWHO 0.833 These rules help to catch brand new domains at the same IP as previous spam domains (i.e. they are IP based BLs). If you have any religous problems with SORBS, leave those out. About 92% of what I see hit the completewhois rule, also hits the meta-rule, and over 9 months, I've never had an FP from the meta rule (which means my scoring is likely out of whack - too high for the BL tests, and too low for the meta rules). Also, as always, watch out for line-wrap and be sure to lint after adding them to any local configuration files. These add two DNS lookups, but will catch about half of Leo's pill spam (adding several points for most of them). Paul Shupak [EMAIL PROTECTED]
Re: Those Re: good obfupills spams (uridnsbl's, A records vs NS records)
List Mail User wrote: These few rules can help a lot (potentially with some possible FPs though). And as always, train your BAYES with the ones that get through and enable the digest tests (i.e. DCC, Pyzor and Razor). uridnsblURI_COMPLETEWHOIS combined-HIB.dnsiplists.completewhois.com. A bodyURI_COMPLETEWHOIS eval:check_uridnsbl('URI_COMPLETEWHOIS') describeURI_COMPLETEWHOIS URI in combined-HIB.dnsiplists.completewhois.com tflags URI_COMPLETEWHOIS net score URI_COMPLETEWHOIS 1.25 snip These rules help to catch brand new domains at the same IP as previous spam domains (i.e. they are IP based BLs). Neat stuff Paul.. I'll have to try it out. That said, technically, doesn't this really look up the IP address by fetching the NS record, not the A record of the URI? (this would catch domains hosted at the same nameserver, not domains hosted at the same server IP address) Or has SA changed and it looks up both NS and A for uridnsbl? I know previously there was a strong argument against looking up the A record, as it provided an opportunity for spammers to poison email with extra URIs that nobody would normally click on or lookup. These poison URIs could be used to trigger DNS attacks, or simply generate slow responses to force a timeout. NS records on the other hand are generally not handled by the spammer's own DNS servers, but are returned by the TLD's servers. ie: the NS record for evi-inc.com is stored on my authoritative DNS server, but it's only there for completeness. Nobody normally queries it from there except my own server. Most folks find out the NS list from the servers for .com (ie: a.gtld-servers.net). This makes it impractical to perform poison URIs if SA is only looking up NS records.
Re: Those Re: good obfupills spams
Greetings, This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? TIA, John On Apr 28, 2006, at 5:36 PM, List Mail User wrote: Bart Schaefer wrote: The largest number of spam messages currently getting through SA at my site are short text-only spams with subject Re: good followed by an obfuscated drug name (so badly mangled as to be unrecognizable in many cases). The body contains a gappy-text list of several other kinds of equally unreadable pharmaceuticals, a single URL which changes daily if not more often, and then several random words and a short excerpt from a novel. They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone aren't scored high enough to classify as spam, and I'm reluctant to crank them up just for this. However, the number of spams getting through SA has tripled in the last four days or so, from around 14 for every thousand trapped, to around 40. I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far they aren't having any useful effect. Other suggestions?
Re: Those Re: good obfupills spams
John Tice wrote: Greetings, This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? BAYES_99, by definition, has a 1% false positive rate.
Re: Those Re: good obfupills spams (uridnsbl's, A records vs NS records)
Neat stuff Paul.. I'll have to try it out. That said, technically, doesn't this really look up the IP address by fetching the NS record, not the A record of the URI? (this would catch domains hosted at the same nameserver, not domains hosted at the same server IP address) Or has SA changed and it looks up both NS and A for uridnsbl? I know previously there was a strong argument against looking up the A record, as it provided an opportunity for spammers to poison email with extra URIs that nobody would normally click on or lookup. These poison URIs could be used to trigger DNS attacks, or simply generate slow responses to force a timeout. NS records on the other hand are generally not handled by the spammer's own DNS servers, but are returned by the TLD's servers. ie: the NS record for evi-inc.com is stored on my authoritative DNS server, but it's only there for completeness. Nobody normally queries it from there except my own server. Most folks find out the NS list from the servers for .com (ie: a.gtld-servers.net). This makes it impractical to perform poison URIs if SA is only looking up NS records. Matt, While I'd like to see two classes of rules, and both types of BLs used for both types of lookup (preferably with different scores - since my testing shows very different FP and FN rates for 'A' and 'NS' checks), you are completely correct: IP based BLs are only used for the 'NS' checks and RHS based BLs are only used for targeted domain checks (and not for the domain of the URI's NSs). Currently nothing is used to directly check the IP of the spam site (i.e. the 'A' RR), but since in many cases this happens to be the same as the NS' IP, the IP based BLs often are checking it (though almost by accident). I personally think that poisoning spam with extra URIs is already seen quite a bit, and the issue of DNS timeouts is almost a non-issue, since you would be no worse off than before. Already we see stock pumpdump and 419 spams with large amounts of poison URIs in them. Ultimately the spammer wants as short a message as he can get by with to maximize the use of his own bandwidth (or the stolen bandwidth he has access to). What makes these test much more efficient than you might expect is that many very-large scale spammers (think ROKSO top-ten) tend to use the same hosts/IPs for both the web hosting and the DNS server. Also they tend to reuse IPs so that last week's spam web server is this week's spam DNS server. This means that hosts that hit SORBS spam-traps are often name servers for current spam runs using brand new domain names that haven't made SURBL or URIBL lists yet (or sometimes, if you have the misfortune of being at the start of a run, haven't even hit the digests yet). I find (after already significant MTA filtering) that these few rules hit about 10% to 25% of the spam I get. The SORBS spam list alone hits almost 25% of spam, but also hits about .85% of ham (but much of that is email that many people would consider spam), The completewhois list hits about 12% of spam, but again, ~.7% of ham. The meta rules hit slightly more than the product of the hit ratios of the individual rules (i.e. including the SBL) for spam (except the completewhois/SBL meta which hits 92% of the original completewhois hits - i.e. mostly Chinese and Korean IPs, but some from all parts of the world), and have a no ham hits over the past two or three months (and only one or two ever); This implies that they are indeed independent, with different FP sources and heavily biased toward spam to begin with. They do disproportionally catch certain spammers, so they can be though of as similar to the SARE Specific rule set. In particular they work extremely well against certain classes of pill and mortgage spam. Paul Shupak [EMAIL PROTECTED]
Re: Those Re: good obfupills spams
From: Matt Kettler [EMAIL PROTECTED] John Tice wrote: Greetings, This is my first post after having lurked some. So, I'm getting these same RE: good spams but they're hitting eight rules and typically scoring between 30 and 40. I'm really unsophisticated compared to you guys, and it begs the question––what am I doing wrong? All I use is a tweaked user_prefs wherein I have gradually raised the scores on standard rules found in spam that slips through over a period of time. These particular spams are over the top on bayesian (1.0), have multiple database hits, forged rcvd_helo and so forth. Bayesian alone flags them for me. I'm trying to understand the reason you would not want to have these type of rules set high enough? I must be way over optimized––what am I not getting? BAYES_99, by definition, has a 1% false positive rate. That is what Bayes thinks. I think it is closer to something between 0.5% and 0.1% false positive. I have mine trained down lethally fine at this point, it appears. {^_-}