Hello Peter, Wednesday, July 6, 2005, 7:48:36 AM, you wrote:
PF> Secondly, for a larger corpus, wondering if there is much difference PF> between perceptron scores for last 6 months, and last 1 month. If you PF> already have the ham/spam.log for the last 6 months, complete with the PF> "time" field, how much do the perceptron scores differ for the last 6 PF> months, and last 1 month? The thinking behind this is in moving towards PF> more regular rule score updates (at least locally), based on the current PF> flavour of spam. It may be a self defeating exercise though, if spam PF> and scores are both moving targets. I can't speak for the perceptron, but I launch mass-checks on the various SARE rule set files I maintain approximatley monthly (once they're stable -- new files are mass-checked more frequently). The hits rates, both ham and spam, vary significantly month to month. That may be because SARE rules generally test for lower hit rates than official rules do, therefore our hit rates may be less statistically stable, but extrapolating to the general case, I believe that "most recent month" spam is different from "six months of spam." Bob Menschel
