On 11 Aug 2014, at 10:22, Justin Edmands wrote:
Bill,
Thank you very much for the response. The detail is much appreciated.
As Ged mentioned, not vague, helpful to say the least. The part about
highly trusted rules caught my attention:
"Another way to increase autolearning without going all the way to the
"learn on error" behavior is to flag rules that you trust highly as
"autolearn_force" so that messages matching them won't ever be
excluded from autolearning based on the existing Bayes DB disagreeing
with the deterministic rules."
I think these will get me started:
tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force
Any others that are definites?
That's a hard question for anyone to answer without knowing your
mailstream's quirks. I can't tell you who your users are and what sort
of mail they want that matches which rules. The default SA rules have
mostly low scores because they are all individually highly error-prone.
I'm especially wary about putting too much trust in individual rules
because I get lots of mail that talks about spam, often with things like
lists of evil domains that trigger URIBL rules. And INVALID_DATE shows
up in a surprising number of ethically upstanding but technically sordid
messages (e.g. Terminix customer notices.) This is why I reserve
autolearn_force for meta-rules, since it carries a risk of turning a few
false positives into a bad Bayes DB. The specific example of what I
described that I can share is this locally-defined rule:
describe URIBL_MULTI1 Multiple URIBL hits
meta URIBL_MULTI1 URIBL_DBL_SPAM + URIBL_RED + URIBL_BLACK + URIBL_SBL +
URIBL_WS_SURBL + URIBL_OB_SURBL + URIBL_JP_SURBL + URIBL_SC_SURBL > 2
score URIBL_MULTI1 10
tflags URIBL_MULTI1 autolearn_force
That means that if 3 or more of 8 different URIBL tests hit on a
message, In tack on an extra 10 point and override the learner
protections. I should add a note of warning by example: last week a
thread in the Postfix users list was started with a message including a
long list of spammer domains, causing the original message and any that
fully quoted it to match *6* of those URIBLs. If your mailstream
includes mail discussing spam, you have to take precautions to protect
from such things ruining your Bayes DB.
My other autolearn_force rules are also meta-rules that bundle multiple
rules, but I unfortunately cannot freely share their details as the
constituent rules come from private (i.e. encumbered) sources. The
general process I use is to look for clusters of rules (positive OR
negative) that often hit together on mail that gets a Bayes score in the
opposite direction. Before SA 3.4 I just set high scores on those
meta-rules to assure rejection, but autolearn_force improves on that.
___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID. You may ignore it.
Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang