When was the last time an optimal value for this threshold was checked? It currently defaults to 5. It's the number of reports to pyzor required to trigger a hit on PYZOR_CHECK. I *think* in this context a report just means "I got this email", not "This is spam."
Increasing it should reduce false positives (while probably decreasing true positives). Thinking about this, and our inability to keep automatcially generated scores in order, it seems like it would be useful to have tests like: PYZOR_CHECK_005 - At least 5 reports PYZOR_CHECK_020 - At least 20 reports. PYZOR_CHECK_040 - etc.. PYZOR_CHECK_160 Which are cumulative. So if pyzor says 40 reports, you hit all of the first 3 rules. Seems like the re-scorer should do useful things with that? I came across this stuff after finding this in pyzor's mailing list archives from two months ago: "I think the SA plug-in will believe Pyzor if there is any number of reports, but more judicious decisions can be made if the report count is taken into consideration (however, since Pyzor's just another rule inside of SA, that perhaps isn't necessary in that context)." http://sourceforge.net/mailarchive/forum.php?thread_name=5A6FB571-CCAB-4766-939C-E3CCA75FA370%40spamexperts.com&forum_name=pyzor-users This person's belief was incorrect, but I'm still curious about improving the accuracy from it. Report counts on my most recent non-spam hits: public.pyzor.org:24441 (200, 'OK') 460 0 public.pyzor.org:24441 (200, 'OK') 749 0 public.pyzor.org:24441 (200, 'OK') 749 0 public.pyzor.org:24441 (200, 'OK') 460 0 public.pyzor.org:24441 (200, 'OK') 460 0 Report counts on my most recent spam hits with scores over 10: public.pyzor.org:24441 (200, 'OK') 20817 0 public.pyzor.org:24441 (200, 'OK') 1705 0 public.pyzor.org:24441 (200, 'OK') 363 0 public.pyzor.org:24441 (200, 'OK') 29 0 public.pyzor.org:24441 (200, 'OK') 21812 0 The varying number (460, 749, etc.) is the number of reports. The 0 at the end is the whitelisting count. I don't know if it's ever actually used. I'd be curious to see the statistics I could gather from putting this stuff in a header. -- "The price of freedom is the willingness to do sudden battle, anywhere, at any time, and with utter recklessness." - Robert A. Heinlein http://www.ChaosReigns.com
