Re: BAYES_999 strange behavior
On 2/20/2014 2:09 PM, Daniel Staal wrote: --As of February 20, 2014 1:56:18 PM -0500, Kevin A. McGrail is alleged to have said: People have hard_coded BAYES_999 entries as well. I recommend forwarding the announcement from John to the other mailing lists you are aware of these discussions. --As for the rest, it is mine. I intend to, as soon as I'm sure what's going to happen. ;) I just don't want people who've fixed their scores to be penalized. I know that doesn't help people who copied your block re-defining the rules entirely, but nothing really helps them. (Besides telling them not to do that unless they know what they are doing.) As of about 10:30EST Tonight, I expect that versions 3.3.X will be able to use sa-update to receive an update that includes BAYES_99 as it used to exist + BAYES_999 which overlaps with BAYES_99 and adds 0.2 to the score. By about 4AM tomorrow, version 3.4.1 will have an update though likely no one can access that update. Tomorrow morning by about 10AM, I will update 3.4.0 manually to receive the 3.4.1 update. So as of ~1 hour past the times above based on the version in use to allow for DNS ttl and mirror updates, I would recommend people run sa-update and remove any manual edits for rules named BAYES_99 or BAYES_999. If they have manual scoring for these, they will want to review those scores for their own installation. BAYES_99 scores in the 3.75 range and BAYES_999 will score in the 0.25 range. Anything outside of those scores should be done understanding your own Bayesian database. They can confirm they received the correct update if the rule score for BAYES_999 changes to 0.2, i.e. for a default path 3.4.0 installation: grep BAYES_999 /var/lib/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf gives score BAYES_999 0 0 4.03.7 Tomorrow, this should change to 0.2. regards, KAM
Re: BAYES_999 strange behavior
--As of February 20, 2014 1:56:18 PM -0500, Kevin A. McGrail is alleged to have said: People have hard_coded BAYES_999 entries as well. I recommend forwarding the announcement from John to the other mailing lists you are aware of these discussions. --As for the rest, it is mine. I intend to, as soon as I'm sure what's going to happen. ;) I just don't want people who've fixed their scores to be penalized. I know that doesn't help people who copied your block re-defining the rules entirely, but nothing really helps them. (Besides telling them not to do that unless they know what they are doing.) Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: BAYES_999 strange behavior
On 2/20/2014 1:31 PM, John Hardin wrote: On Thu, 20 Feb 2014, Daniel Staal wrote: --As of February 20, 2014 9:23:56 AM -0800, John Hardin is alleged to have said: BAYES_99 is being reverted to its original definition and BAYES_999 is being converted to an overlapping additive rule that adds some more points to BAYES_99 for the very top end of Bayes score. If you have locally set a high score for BAYES_999 you may want to reduce or remove that override. (Then again, BAYES_99 + BAYES_999 scoring 10+ isn't really *that* much of a problem unless your Bayes database is off the rails... :) ) This should go out within the next couple of rule updates. --As for the rest, it is mine. Just as a note: This discussion went quite a bit further than this mailing list, since the rule leak affected anyone using a recent version of Spamassassin. I know for certain it reached NANOG, for example. Given that there are likely people who've rescored the BAYES_999 rule and will not see this decision, would it be possible to release it as a *different* rule? (And retire BAYES_999 entirely.) Name it BAYES_99_9 or something, so that previous quick-fixes don't affect people negatively? A surprise change to over-score messages quickly following a surprise change to under-score messages just hits me wrong. I'd be nice if we could avoid causing more problems. Daniel T. Staal Wow. Ok. Kevin: how about the BAYES_100 suggestion? People have hard_coded BAYES_999 entries as well. I recommend forwarding the announcement from John to the other mailing lists you are aware of these discussions. Regards, KAM
Re: BAYES_999 strange behavior
On Thu, 20 Feb 2014, Daniel Staal wrote: --As of February 20, 2014 9:23:56 AM -0800, John Hardin is alleged to have said: BAYES_99 is being reverted to its original definition and BAYES_999 is being converted to an overlapping additive rule that adds some more points to BAYES_99 for the very top end of Bayes score. If you have locally set a high score for BAYES_999 you may want to reduce or remove that override. (Then again, BAYES_99 + BAYES_999 scoring 10+ isn't really *that* much of a problem unless your Bayes database is off the rails... :) ) This should go out within the next couple of rule updates. --As for the rest, it is mine. Just as a note: This discussion went quite a bit further than this mailing list, since the rule leak affected anyone using a recent version of Spamassassin. I know for certain it reached NANOG, for example. Given that there are likely people who've rescored the BAYES_999 rule and will not see this decision, would it be possible to release it as a *different* rule? (And retire BAYES_999 entirely.) Name it BAYES_99_9 or something, so that previous quick-fixes don't affect people negatively? A surprise change to over-score messages quickly following a surprise change to under-score messages just hits me wrong. I'd be nice if we could avoid causing more problems. Daniel T. Staal Wow. Ok. Kevin: how about the BAYES_100 suggestion? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- It is not the place of government to make right every tragedy and woe that befalls every resident of the nation. --- 2 days until George Washington's 282nd Birthday
Re: BAYES_999 strange behavior
--As of February 20, 2014 9:23:56 AM -0800, John Hardin is alleged to have said: BAYES_99 is being reverted to its original definition and BAYES_999 is being converted to an overlapping additive rule that adds some more points to BAYES_99 for the very top end of Bayes score. If you have locally set a high score for BAYES_999 you may want to reduce or remove that override. (Then again, BAYES_99 + BAYES_999 scoring 10+ isn't really *that* much of a problem unless your Bayes database is off the rails... :) ) This should go out within the next couple of rule updates. --As for the rest, it is mine. Just as a note: This discussion went quite a bit further than this mailing list, since the rule leak affected anyone using a recent version of Spamassassin. I know for certain it reached NANOG, for example. Given that there are likely people who've rescored the BAYES_999 rule and will not see this decision, would it be possible to release it as a *different* rule? (And retire BAYES_999 entirely.) Name it BAYES_99_9 or something, so that previous quick-fixes don't affect people negatively? A surprise change to over-score messages quickly following a surprise change to under-score messages just hits me wrong. I'd be nice if we could avoid causing more problems. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: BAYES_999 strange behavior
On 2/20/14 11:23 AM, "John Hardin" wrote: >BAYES_99 is being reverted to its original definition and BAYES_999 is >being converted to an overlapping additive rule that adds some more points >to BAYES_99 for the very top end of Bayes score. >This should go out within the next couple of rule updates. Can we get a posting to this list when that rule update happens? -- Dave Pooser Cat-Herder-in-Chief, Pooserville.com "...Life is not a journey to the grave with the intention of arriving safely in one pretty and well-preserved piece, but to slide across the finish line broadside, thoroughly used up, worn out, leaking oil, and shouting GERONIMO!!!" -- Bill McKenna
Re: BAYES_999 strange behavior
On Tue, 18 Feb 2014, Dave Pooser wrote: BAYES_99 used to hit for emails that the naive Bayesian classifier identified as 99% to 100% spam. BAYES_99 is now split into two rules to give it finer gradient on scores for different percentages: BAYES_99 99% to 99.9% BAYES_999 99.9% to 100% It would make my life a lot easier if instead BAYES_999 were an additional rule. Status update: BAYES_99 is being reverted to its original definition and BAYES_999 is being converted to an overlapping additive rule that adds some more points to BAYES_99 for the very top end of Bayes score. If you have locally set a high score for BAYES_999 you may want to reduce or remove that override. (Then again, BAYES_99 + BAYES_999 scoring 10+ isn't really *that* much of a problem unless your Bayes database is off the rails... :) ) This should go out within the next couple of rule updates. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control enables genocide while doing little to reduce crime. --- 2 days until George Washington's 282nd Birthday
Re: BAYES_999 strange behavior
On 2/19/2014 9:37 AM, Bowie Bailey wrote: On 2/18/2014 8:49 PM, Kevin A. McGrail wrote: On 2/18/2014 6:05 PM, Amir Caspi wrote: On Feb 18, 2014, at 3:58 PM, John Hardin wrote: Is there some reason the Bayes scores can't/shouldn't be static? Indeed, I am wondering why Bayes would be auto-scored at all. By definition, Bayes high scores should match only on spam, low scores should match only on ham. That's not perfect, of course, but it is basically by definition of how Bayes learns. Given that, it seems to me that the Bayes scores should be static, and my experience suggests that 99 or 999 should be scored pretty heavily. (I'd say 00 should be scored negatively heavily, but I get enough FNs with 00 that I don't like that idea... though it probably means my DB is borked or my ham is full of spammy tokens.) Actually it's a bit the opposite especially if using autolearn where scoring to high on the 99% end can cause low percentage corpora to swing heavily towards the high score too rapidly. Bayes scores are not included when determining what to autolearn, so changing the Bayes scores should have no effect on autolearning. Or am I missing something? I would have to look at the permutations of bayes_auto_learn_on_error, bayes_auto_learn_threshold_spam and the tflag autolearn_force to answer that question but my memory is that this is a self-perpetuating cycle that I've seen on live servers when testing scoring. regards, KAM
Re: BAYES_999 strange behavior
On 2/18/2014 8:49 PM, Kevin A. McGrail wrote: On 2/18/2014 6:05 PM, Amir Caspi wrote: On Feb 18, 2014, at 3:58 PM, John Hardin wrote: Is there some reason the Bayes scores can't/shouldn't be static? Indeed, I am wondering why Bayes would be auto-scored at all. By definition, Bayes high scores should match only on spam, low scores should match only on ham. That's not perfect, of course, but it is basically by definition of how Bayes learns. Given that, it seems to me that the Bayes scores should be static, and my experience suggests that 99 or 999 should be scored pretty heavily. (I'd say 00 should be scored negatively heavily, but I get enough FNs with 00 that I don't like that idea... though it probably means my DB is borked or my ham is full of spammy tokens.) Actually it's a bit the opposite especially if using autolearn where scoring to high on the 99% end can cause low percentage corpora to swing heavily towards the high score too rapidly. Bayes scores are not included when determining what to autolearn, so changing the Bayes scores should have no effect on autolearning. Or am I missing something? -- Bowie
Re: BAYES_999 strange behavior
On 2/18/14 8:52 PM, "Kevin A. McGrail" wrote: >I am not disagreeing it would have been an interesting approach but the >rules were promoted accidentally to begin with. I'm just doing triage >to get things functional right now Totally understand, and I didn't mean to whinge. The bright side is it's given me an impetus to redesign my meta rules with an abstraction layer in between stock rules and my meta rules so I'll be better positioned to take advantage of new rules in future. -- Dave Pooser Cat-Herder-in-Chief, Pooserville.com "...Life is not a journey to the grave with the intention of arriving safely in one pretty and well-preserved piece, but to slide across the finish line broadside, thoroughly used up, worn out, leaking oil, and shouting GERONIMO!!!" -- Bill McKenna
Re: BAYES_999 strange behavior
On 2/18/2014 5:54 PM, Dave Pooser wrote: I use several meta rules that include BAYES_99 and now I'm having to go rewrite those rules to include (BAYES_99 || BAYES_999). Which raises the question-- is there a performance hit for making meta rules include other meta rules? That is: is meta_DP_BAYES_VBAD (BAYES_99 || BAYES_999) metaDP_FRM_INFO_BAYES_VBD DP_FRM_INFO && _DP_BAYES_VBAD any worse from a performance standpoint than metaDP_FRM_INFO_BAYES_VBD DP_FRM_INFO && (BAYES_99 || BAYES_999) under normal conditions? I'd have to do a Timing run to prove it but I would guess this is infintessimal.
Re: BAYES_999 strange behavior
On 2/18/2014 5:44 PM, Dave Pooser wrote: BAYES_99 used to hit for emails that the naive Bayesian classifier identified as 99% to 100% spam. BAYES_99 is now split into two rules to give it finer gradient on scores for different percentages: BAYES_99 99% to 99.9% BAYES_999 99.9% to 100% It would make my life a lot easier if instead BAYES_999 were an additional rule. I use several meta rules that include BAYES_99 and now I'm having to go rewrite those rules to include (BAYES_99 || BAYES_999). Granted, it's probably a lesson that I need to find a better way to generate rules programmatically, but I'd rather tinker with that sometime when $DAYJOB is not requiring 12-hour days from me I am not disagreeing it would have been an interesting approach but the rules were promoted accidentally to begin with. I'm just doing triage to get things functional right now so I'm damned if we do and damned if we don't. Probably best you plan for the new change for now, sorry to say. Regards, KAM
Re: BAYES_999 strange behavior
On 2/18/2014 6:05 PM, Amir Caspi wrote: On Feb 18, 2014, at 3:58 PM, John Hardin wrote: Is there some reason the Bayes scores can't/shouldn't be static? Indeed, I am wondering why Bayes would be auto-scored at all. By definition, Bayes high scores should match only on spam, low scores should match only on ham. That's not perfect, of course, but it is basically by definition of how Bayes learns. Given that, it seems to me that the Bayes scores should be static, and my experience suggests that 99 or 999 should be scored pretty heavily. (I'd say 00 should be scored negatively heavily, but I get enough FNs with 00 that I don't like that idea... though it probably means my DB is borked or my ham is full of spammy tokens.) Actually it's a bit the opposite especially if using autolearn where scoring to high on the 99% end can cause low percentage corpora to swing heavily towards the high score too rapidly. Regards, KAm
Re: BAYES_999 strange behavior
On 2/18/2014 5:58 PM, John Hardin wrote: On Tue, 18 Feb 2014, Dave Pooser wrote: BAYES_99 used to hit for emails that the naive Bayesian classifier identified as 99% to 100% spam. BAYES_99 is now split into two rules to give it finer gradient on scores for different percentages: BAYES_99 99% to 99.9% BAYES_999 99.9% to 100% It would make my life a lot easier if instead BAYES_999 were an additional rule. I agree, but doing that makes the auto-scoring a bit problematic. Is there some reason the Bayes scores can't/shouldn't be static? Well having learned the hard way, it doesn't appear that the perceptron can score the new rules properly so they sort of have to be static ;-)
Re: BAYES_999 strange behavior
On Feb 18, 2014, at 3:58 PM, John Hardin wrote: > > Is there some reason the Bayes scores can't/shouldn't be static? > Indeed, I am wondering why Bayes would be auto-scored at all. By definition, Bayes high scores should match only on spam, low scores should match only on ham. That's not perfect, of course, but it is basically by definition of how Bayes learns. Given that, it seems to me that the Bayes scores should be static, and my experience suggests that 99 or 999 should be scored pretty heavily. (I'd say 00 should be scored negatively heavily, but I get enough FNs with 00 that I don't like that idea... though it probably means my DB is borked or my ham is full of spammy tokens.) > -- > John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ > jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org > key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 > --- > When people get used to preferential treatment, > equal treatment seems like discrimination. -- Thomas Sowell > --- > 4 days until George Washington's 282nd Birthday
Re: BAYES_999 strange behavior
On Tue, 18 Feb 2014, Dave Pooser wrote: BAYES_99 used to hit for emails that the naive Bayesian classifier identified as 99% to 100% spam. BAYES_99 is now split into two rules to give it finer gradient on scores for different percentages: BAYES_99 99% to 99.9% BAYES_999 99.9% to 100% It would make my life a lot easier if instead BAYES_999 were an additional rule. I agree, but doing that makes the auto-scoring a bit problematic. Is there some reason the Bayes scores can't/shouldn't be static? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- When people get used to preferential treatment, equal treatment seems like discrimination. -- Thomas Sowell --- 4 days until George Washington's 282nd Birthday
Re: BAYES_999 strange behavior
>It would make my life a lot easier if instead BAYES_999 were an additional >rule. That is, if BAYES_999 fired *in addition to* BAYES_99. > I use several meta rules that include BAYES_99 and now I'm having to >go rewrite those rules to include (BAYES_99 || BAYES_999). Which raises the question-- is there a performance hit for making meta rules include other meta rules? That is: is meta_DP_BAYES_VBAD (BAYES_99 || BAYES_999) metaDP_FRM_INFO_BAYES_VBD DP_FRM_INFO && _DP_BAYES_VBAD any worse from a performance standpoint than metaDP_FRM_INFO_BAYES_VBD DP_FRM_INFO && (BAYES_99 || BAYES_999) under normal conditions? -- Dave Pooser Cat-Herder-in-Chief, Pooserville.com "...Life is not a journey to the grave with the intention of arriving safely in one pretty and well-preserved piece, but to slide across the finish line broadside, thoroughly used up, worn out, leaking oil, and shouting GERONIMO!!!" -- Bill McKenna
Re: BAYES_999 strange behavior
>BAYES_99 used to hit for emails that the naive Bayesian >classifier identified as 99% to 100% spam. > >BAYES_99 is now split into two rules to give it finer gradient on scores >for different percentages: > >BAYES_99 99% to 99.9% >BAYES_999 99.9% to 100% It would make my life a lot easier if instead BAYES_999 were an additional rule. I use several meta rules that include BAYES_99 and now I'm having to go rewrite those rules to include (BAYES_99 || BAYES_999). Granted, it's probably a lesson that I need to find a better way to generate rules programmatically, but I'd rather tinker with that sometime when $DAYJOB is not requiring 12-hour days from me -- Dave Pooser Cat-Herder-in-Chief, Pooserville.com "...Life is not a journey to the grave with the intention of arriving safely in one pretty and well-preserved piece, but to slide across the finish line broadside, thoroughly used up, worn out, leaking oil, and shouting GERONIMO!!!" -- Bill McKenna
Re: BAYES_999 strange behavior
On 2/17/2014 4:12 PM, Ian Zimmerman wrote: On Mon, 17 Feb 2014 16:05:23 -0500 "Kevin A. McGrail" wrote: Kevin> BAYES_999 is just a finer gradient on BAYES_99 allowing for a Kevin> higher score on the top .001% of Bayes hits. Thanks for your reply. Could you explain in a bit more detail what "gradient on top" (of another rule) means? It doesn't mean the score is meant to be additive with the base rule, does it? 'Cause these spams _do not_ trigger any of the bayes rules _except_ for BAYES_999. That's why they score too low to be caught. Sure. BAYES_99 used to hit for emails that the naive Bayesian classifier identified as 99% to 100% spam. BAYES_99 is now split into two rules to give it finer gradient on scores for different percentages: BAYES_99 99% to 99.9% BAYES_999 99.9% to 100% That split was theoretically being tested but got auto-promoted without a proper score defaulting to 1. regards, KAM
Re: BAYES_999 strange behavior
On Mon, 17 Feb 2014 16:05:23 -0500 "Kevin A. McGrail" wrote: Kevin> BAYES_999 is just a finer gradient on BAYES_99 allowing for a Kevin> higher score on the top .001% of Bayes hits. Thanks for your reply. Could you explain in a bit more detail what "gradient on top" (of another rule) means? It doesn't mean the score is meant to be additive with the base rule, does it? 'Cause these spams _do not_ trigger any of the bayes rules _except_ for BAYES_999. That's why they score too low to be caught. -- Please *no* private copies of mailing list or newsgroup messages. gpg public key: 2048R/984A8AE4 fingerprint: 7953 ADA1 0E8E AB57 FB79 FFD2 360A 88B2 984A 8AE4 Funny pic: http://bit.ly/ZNE2MX signature.asc Description: PGP signature
Re: BAYES_999 strange behavior
On 2/17/2014 3:59 PM, Ian Zimmerman wrote: Hello. This is the first time SA is giving me enough trouble that I need to ask for help. I hope I get this right. I observed a marked increase in false negatives in the last few weeks. There have definitely been some increases in the past few weeks but as you'll see below, I think BAYES_99/999 is not the culprit except very recently. Only today I had enough sense to look at the detailed scores. And, all the escaped spams have hit the BAYES_999 rule. I grepped the site configuration directory: The BAYES_999 rule changed in the last day or three. I was expecting the ruleqa engine to score it appropriately and it didn't. BAYES_999 is just a finer gradient on BAYES_99 allowing for a higher score on the top .001% of Bayes hits. It'll be fixed with the next rule update but you might want these temporarily: body BAYES_99 eval:check_bayes('0.99', '0.999') body BAYES_999 eval:check_bayes('0.999', '1.00') score BAYES_99 0 0 3.83.5 score BAYES_999 0 0 4.03.7 regards, KAM
BAYES_999 strange behavior
Hello. This is the first time SA is giving me enough trouble that I need to ask for help. I hope I get this right. I observed a marked increase in false negatives in the last few weeks. Only today I had enough sense to look at the detailed scores. And, all the escaped spams have hit the BAYES_999 rule. I grepped the site configuration directory: [3+0]~$ fgrep -h BAYES_999 /var/lib/spamassassin/3.003002/updates_spamassassin_org/*.cf ##{ BAYES_999 ifplugin Mail::SpamAssassin::Plugin::Bayes body BAYES_999 eval:check_bayes('0.999', '1.00') tflags BAYES_999 learn,publish describe BAYES_999 Bayes spam probability is 99.9 to 100% # score BAYES_999 0 0 4.84.5 ##} BAYES_999 ifplugin Mail::SpamAssassin::Plugin::Bayes so it seems this is the "highest spamminess" rule, and the score in the config file reflects that. But the message header is: X-Spam-Tests: BAYES_999=1,DOS_OE_TO_MX=2.523,HTML_MESSAGE=0.001, The score for BAYES_999 is 1 in all cases :( Where does the 1 come from??? Certainly not from my user_prefs, I go to great lengths not to change any scores. And the factory configuration doesn't even seem to have this rule: [4+0]~$ fgrep -h BAYES_999 /usr/share/spamassassin/*.cf [5+0]~$ I am baffled. Is this a bug? My configuration: version 3.3.2 daily sa-update run stores updates in /var/lib/spamassassin/ spamd + spamc --headers -- Please *no* private copies of mailing list or newsgroup messages. gpg public key: 2048R/984A8AE4 fingerprint: 7953 ADA1 0E8E AB57 FB79 FFD2 360A 88B2 984A 8AE4 Funny pic: http://bit.ly/ZNE2MX signature.asc Description: PGP signature