On 09/02/11 23:31, Warren Togami Jr. wrote:
On 2/4/2011 7:40 AM, Steve Freegard wrote:
My apologies for the rookie mistake. As a recent addition to the team I
missed the original discussion on requiring 'tflags nopublish' for all
sandbox rules.

Can someone point me at the Bug that discussed this and I'll update the
RuleSandboxes Wiki entry to reflect this.

Currently this is only among our "tribal knowledge". The Wiki needs a bit of reorg to put this stuff into a logical place.


Ok - I'll update the RuleSandboxes entry. Re-organizing the wiki is a totally separate issue.


While I'm on the subject of Sandboxes; what are the rules for adding
DNSBLs to it for testing? Specifically I would like to add the fresh,
fresh10 and fresh15 lists from spameatingmonkey.com for testing as my
own local testing found them to be more effective and faster than the
day-old-bread list that is currently in the core ruleset.

A critical flaw here is the only way to truly measure its performance in masscheck is with "reuse", which means the ham/spam corpora must be tagged with this custom rule during delivery and masscheck only records yes/no from the existing spamassassin headers. It is a waste of resources and with misleading results to test these rules on FRESH type rules on older mail.


Unfortunately I can see many issues with requiring that all corpora must contain SpamAssassin mark-up at the time it was received; this simply isn't practical for some people (myself included as my trap data comes from lots of different sources).

Would this not also be dangerous if the SA headers in the message were generated by an older version of SA or one that had been modified or were running with local-tests-only or intentionally or unintentionally poisoned to skew the results?

Blacklist effectiveness for things like day-old-bread and fresh5/10/15 can be checked by looking at the message age versus the hit percentage which IIRC the ruleqa app can do when the rule detail is shown.

This brings me onto another subject:

One thing that strikes me as missing from the ruleqa app is a way to show the overall score distribution for each submitter for ham/spam and the overall number of false positives and false negatives in each corpus (and a total). This would be to check two things:

1) The overall corpus quality (e.g. lots of low or negative scoring mail in a spam corpus would potentially cause concern as with ham and very high scores).

2)  The overall effectiveness of the current rules against the corpora.

The focus should then be on writing rules and plug-ins to deal with the extremes e.g. the false-positives or false-negatives present and reduce the overall percentages of both classes.

I'm doing this myself by parsing the local mass-check logs and copying any false-positives or false-negatives into specific directories so that I can sort through them and update rules as necessary.


And finally; I can't find any information in the Wiki about adding
plugins to a sandbox, specifically if this is allowed and what the
loadplugin line should be to get it to load correctly during the
mass-checks.

What plugin do you have in mind? If it is your URI shortener plugin I would have to strongly protest. Its current design is good in theory but it will never survive in mass production. It has the capability of making spamassassin far too slow while it is too easy to bypass.

No - I wouldn't dream of putting the full short URI decoder into the mass-checks in it's current form; but I would put a version that simply reported when short URIs were detected (BTW - the rules that do this in your sandbox need to backslash escape the dots) as it's more efficient to do this in a plug-in with easily expandable list than with a URI rule and massive regexps IMO. We should have these sorts of 'instrumentation' rules in the corpus that tell us when a particular abuse vector is being more heavily exploited and allows them to be used in meta rules.

Anyway - I was talking in more general terms; I have some ideas for a few other plugins.

Regards,
Steve.

Reply via email to