Re: [Mimedefang] learner indicated ham

2014-08-12 Thread Bill Cole

On 11 Aug 2014, at 10:22, Justin Edmands wrote:


Bill,
Thank you very much for the response. The detail is much appreciated.
As Ged mentioned, not vague, helpful to say the least. The part about
highly trusted rules caught my attention:

"Another way to increase autolearning without going all the way to the
"learn on error" behavior is to flag rules that you trust highly as
"autolearn_force" so that messages matching them won't ever be
excluded from autolearning based on the existing Bayes DB disagreeing
with the deterministic rules."

I think these will get me started:

tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force

Any others that are definites?


That's a hard question for anyone to answer without knowing your 
mailstream's quirks. I can't tell you who your users are and what sort 
of mail they want that matches which rules. The default SA rules have 
mostly low scores because they are all individually highly error-prone.


I'm especially wary about putting too much trust in individual rules 
because I get lots of mail that talks about spam, often with things like 
lists of evil domains that trigger URIBL rules. And INVALID_DATE shows 
up in a surprising number of ethically upstanding but technically sordid 
messages (e.g. Terminix customer notices.) This is why I reserve 
autolearn_force for meta-rules, since it carries a risk of turning a few 
false positives into a bad Bayes DB. The specific example of what I 
described that I can share is this locally-defined rule:


describe URIBL_MULTI1 Multiple URIBL  hits  
meta URIBL_MULTI1 URIBL_DBL_SPAM + URIBL_RED + URIBL_BLACK + URIBL_SBL + 
URIBL_WS_SURBL + URIBL_OB_SURBL + URIBL_JP_SURBL + URIBL_SC_SURBL > 2

score URIBL_MULTI1 10
tflags URIBL_MULTI1 autolearn_force

That means that if 3 or more of 8 different URIBL tests hit on a 
message, In tack on an extra 10 point and override the learner 
protections. I should add a note of warning by example: last week a 
thread in the Postfix users list was started with a message including a 
long list of spammer domains, causing the original message and any that 
fully quoted it to match *6* of those URIBLs. If your mailstream 
includes mail discussing spam, you have to take precautions to protect 
from such things ruining your Bayes DB.


My other autolearn_force rules are also meta-rules that bundle multiple 
rules, but I unfortunately cannot freely share their details as the 
constituent rules come from private (i.e. encumbered) sources. The 
general process I use is to look for clusters of rules (positive OR 
negative) that often hit together on mail that gets a Bayes score in the 
opposite direction. Before SA 3.4 I just set high scores on those 
meta-rules to assure rejection, but autolearn_force improves on that.

___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang


Re: [Mimedefang] learner indicated ham

2014-08-12 Thread Bill Cole

On 9 Aug 2014, at 13:41, G.W. Haywood wrote:


Hi there,

On Sat, 9 Aug 2014, Bill Cole wrote:


... you probably could get a better answer from the broader SA
community, but I'll offer a vague rambling one :)


It wasn't all that vague. :)

You guys do REJECT your spam, don't you?


Generally, yes. I actually manage spam control for multiple systems that 
operate under a diversity of policy regimes, some of which require 
tag-and-release and/or quarantine for some mail that is in fact nearly 
pure spam. On my personal domain (>20yo, including still-live addresses 
used for about a decade unmunged on Usenet) I reject >95% of all 
attempted SMTP transactions before DATA (a majority doomed before MAIL) 
so my "filter_end" function in MD (where SA gets a look) sees a mostly 
de-spammed stream of messages.

___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang