20_smf.cf

Steve Freegard Wed, 09 Feb 2011 16:26:20 -0800

 On 09/02/11 23:31, Warren Togami Jr. wrote:

On 2/4/2011 7:40 AM, Steve Freegard wrote:

My apologies for the rookie mistake. As a recent addition to the team I
missed the original discussion on requiring 'tflags nopublish' for all
sandbox rules.


Can someone point me at the Bug that discussed this and I'll update the
RuleSandboxes Wiki entry to reflect this.

Currently this is only among our "tribal knowledge". The Wiki needs abit of reorg to put this stuff into a logical place.

Ok - I'll update the RuleSandboxes entry. Re-organizing the wiki is atotally separate issue.

While I'm on the subject of Sandboxes; what are the rules for adding
DNSBLs to it for testing? Specifically I would like to add the fresh,
fresh10 and fresh15 lists from spameatingmonkey.com for testing as my
own local testing found them to be more effective and faster than the
day-old-bread list that is currently in the core ruleset.
A critical flaw here is the only way to truly measure its performancein masscheck is with "reuse", which means the ham/spam corpora must betagged with this custom rule during delivery and masscheck onlyrecords yes/no from the existing spamassassin headers. It is a wasteof resources and with misleading results to test these rules on FRESHtype rules on older mail.

Unfortunately I can see many issues with requiring that all corpora mustcontain SpamAssassin mark-up at the time it was received; this simplyisn't practical for some people (myself included as my trap data comesfrom lots of different sources).

Would this not also be dangerous if the SA headers in the message weregenerated by an older version of SA or one that had been modified orwere running with local-tests-only or intentionally or unintentionallypoisoned to skew the results?

Blacklist effectiveness for things like day-old-bread and fresh5/10/15can be checked by looking at the message age versus the hit percentagewhich IIRC the ruleqa app can do when the rule detail is shown.


This brings me onto another subject:

One thing that strikes me as missing from the ruleqa app is a way toshow the overall score distribution for each submitter for ham/spam andthe overall number of false positives and false negatives in each corpus(and a total). This would be to check two things:

1) The overall corpus quality (e.g. lots of low or negative scoringmail in a spam corpus would potentially cause concern as with ham andvery high scores).


2)  The overall effectiveness of the current rules against the corpora.

The focus should then be on writing rules and plug-ins to deal with theextremes e.g. the false-positives or false-negatives present and reducethe overall percentages of both classes.

I'm doing this myself by parsing the local mass-check logs and copyingany false-positives or false-negatives into specific directories so thatI can sort through them and update rules as necessary.

And finally; I can't find any information in the Wiki about adding
plugins to a sandbox, specifically if this is allowed and what the
loadplugin line should be to get it to load correctly during the
mass-checks.
What plugin do you have in mind? If it is your URI shortener plugin Iwould have to strongly protest. Its current design is good in theorybut it will never survive in mass production. It has the capabilityof making spamassassin far too slow while it is too easy to bypass.

No - I wouldn't dream of putting the full short URI decoder into themass-checks in it's current form; but I would put a version that simplyreported when short URIs were detected (BTW - the rules that do this inyour sandbox need to backslash escape the dots) as it's more efficientto do this in a plug-in with easily expandable list than with a URI ruleand massive regexps IMO. We should have these sorts of'instrumentation' rules in the corpus that tell us when a particularabuse vector is being more heavily exploited and allows them to be usedin meta rules.

Anyway - I was talking in more general terms; I have some ideas for afew other plugins.


Regards,
Steve.

Re: svn commit: r1067022 - /spamassassin/trunk/rulesrc/sandbox/smf/20_smf.cf

Reply via email to