Re: scores too low - neural network problem?
What is the output of this on your mesages? spamassassin -tD 21 | pager What value does it show for BAYES_99 in the content analysis section? If it says something other than 4.07 then it confirms that you are not running with values from column four network test off. It sounds instead like you are running with network tests enables. Are network tests enabled in the debugging output? Thank you, this was correct. I thought I had disabled the network tests, but I hadn't. I've disabled them now, and the scoring has returned to what I thought it should be. Regards, Andrew.
Re: scores too low - neural network problem?
I understand that the individual test scores are fed through a neural network to derive the final score. So it seems that this network has started to behave badly. You misunderstand. The neural network (or whatever they're using these days - it at least used to be a genetic algorithm) is used to assign the default scores, not to adjust the scores after the fact. Thank you, you're right. I had misunderstood that. More likely one of two things is happening: that header was added by another system running SpamAssassin, or you aren't running with the configuration you think you are. You're right-- I thought I had disabled the network tests, but I hadn't, so I wasn't getting the scores I thought I was. I disabled the network tests, and the problem is solved now. Regards, Andrew.
scores too low - neural network problem?
I'm running spamc/spamd 3.0.2 in Debian. I have Bayesian tests turned on, and network tests off. Lately a lot of spam has been getting through to my mailbox. SA's false negative rate used to be about 1%; now it's about 50%. Looking at the headers for the spam that's getting through, I see that the Bayesian filter is working correctly: almost all of the spam is tagged as BAYES_95 or BAYES_99. My score threshold is 5, the BAYES_99 test alone (using its default value) is worth 4.07, and a few other tests are usually positive as well. Yet, the total score is around 2.5. Here's a sample from today: X-Spam-Status: No, score=2.7 required=5.0 tests=BAYES_99,HTML_20_30, HTML_FONT_INVISIBLE,HTML_IMAGE_ONLY_24,HTML_MESSAGE autolearn=no version=3.0.2 The scores from the tests listed here should add up to about 5.3, but as you can see, the total is only 2.7. So this one gets through. I understand that the individual test scores are fed through a neural network to derive the final score. So it seems that this network has started to behave badly. Can anyone shed any light on this? Is it a well-known problem? What's the preferred way to address it? Remove all of SA's learned information and retrain the network? Thanks, Andrew.
Re: scores too low - neural network problem?
Andrew Schulman wrote: I'm running spamc/spamd 3.0.2 in Debian. I have Bayesian tests turned on, and network tests off. I am running a similar system. But with network tests turned on. The network tests such as SURBL[1] are huge factors in increasing spam classification accuracy for me. almost all of the spam is tagged as BAYES_95 or BAYES_99. My score threshold is 5, the BAYES_99 test alone (using its default value) is worth 4.07, and a few other tests are usually positive as well. Yet, the total score is around 2.5. Of course as you are aware there are four scores. The first score is used when both Bayes and network tests are disabled (score set 0). The second score is used when Bayes is disabled, but network tests are enabled (score set 1). The third score is used when Bayes is enabled and network tests are disabled (score set 2). The fourth score is used when Bayes is enabled and network tests are enabled (score set 3). The default for BAYES_99 in SA-3.0.2 is: score BAYES_99 0 0 4.070 1.886 I fell to confusion on this exact thing debugging a problem of mine a while ago. I thought I was using one column but was really getting data from the other. What is the output of this on your mesages? spamassassin -tD 21 | pager What value does it show for BAYES_99 in the content analysis section? If it says something other than 4.07 then it confirms that you are not running with values from column four network test off. It sounds instead like you are running with network tests enables. Are network tests enabled in the debugging output? I understand that the individual test scores are fed through a neural network to derive the final score. So it seems that this network has started to behave badly. Because you are getting the BAYES_99 tag I am sure the bayes engine is working properly. You are seeing a scoring difference instead. Can anyone shed any light on this? Is it a well-known problem? What's the preferred way to address it? Remove all of SA's learned information and retrain the network? Don't retrain! I am convinced by your evidence that you are actually running with network tests enables. Compare the result with the following. Does this give you the results you were looking for? spamassassin -L -tD 21 | pager Bob [1] http://www.surbl.org/
Re: scores too low - neural network problem?
On Saturday 05 March 2005 1:21 pm, Andrew Schulman wrote: I understand that the individual test scores are fed through a neural network to derive the final score. So it seems that this network has started to behave badly. You misunderstand. The neural network (or whatever they're using these days - it at least used to be a genetic algorithm) is used to assign the default scores, not to adjust the scores after the fact. More likely one of two things is happening: that header was added by another system running SpamAssassin, or you aren't running with the configuration you think you are. Double-check your config and make sure network tests really are disabled. I added up the scores for the tests you mentioned using the 4th column (Bayes + network both enabled) and it comes out to 2.65 - which would round to the 2.7 you're seeing. -- Kelson Vibber SpeedGate Communications www.speed.net