James G. Sack (jim) wrote:
Just curious: What do you actually do to look for false positives?
That's a good question. There are a few ways to look for false
positives. The first thing I do is sort my spam folder by subject. For
some reason some of my spam gets flagged as spam in the header but
doesn't get the spam tag added to the subject. So they usually appear
first in the list because everything else begins with {Spam? <score>}
and these tend to have the higher false positive rate. This is where a
lot of the sketchy but legit mailing list false positives end up. I scan
through those looking for things like [listname] in the subject or names
for people I know. I flip through it a page at a time and familiar
people or list names just jump out at me.
I have 837 emails since I cleaned out the folder last night. That jives
with about a thousand a day. And that thousand are the ones that make it
past my greylisting. My logs suggest easily another thousand a day get
turned away by greylisting. So I really get around 2000 spams a day.
So I'm cleaning out my spam folder right now just to give you guys some
real examples. Of the emails that weren't tagged with {Spam? <score>} in
the subject I have 3 false positives from mailing lists. Here are the
subject lines and what SpamAssassin had to say about them:
Subject: Re: [asterisk-biz] Connecticut DID needed
X-copilot-MailScanner-SpamCheck: spam, ORDB-RBL, SpamAssassin (score=7.623,
required 5, BAYES_00 -2.60, DATE_IN_FUTURE_06_12 1.67,
HTML_MESSAGE 0.00, HTML_TITLE_EMPTY 0.21, RCVD_IN_DSBL 2.60,
RCVD_IN_PBL 0.00, RCVD_IN_XBL 3.90, URI_SCHEME_MIXED_CASE 1.84)
Subject: Re[2]: [Haskell-cafe] Parallel weirdness [new insights]
X-copilot-MailScanner-SpamCheck: spam, ORDB-RBL, SpamAssassin (score=5.973,
required 5, BAYES_00 -2.60, RCVD_IN_BL_SPAMCOP_NET 1.56,
RCVD_IN_PBL 0.00, RCVD_IN_SORBS_WEB 1.46, RCVD_IN_XBL 3.90,
SARE_STILLSINGLE 1.66)
Subject: [PATCH] x86: remove NexGen support
X-copilot-MailScanner-SpamCheck: spam, ORDB-RBL, SpamAssassin (score=5.192,
required 5, BAYES_00 -2.60, FORGED_RCVD_HELO 0.14,
RCVD_IN_BL_SPAMCOP_NET 1.56, RCVD_IN_PBL 0.00,
RCVD_IN_SORBS_DUL 2.05, RCVD_IN_XBL 3.90, TW_JN 0.08, TW_OV 0.08)
The 7.623 score was pretty spamming but the 5.973 and 5.192 scores just
barely slipped by as spam. Given that I get a couple thousand legit
mails per day with all of my mailing list traffic I think that is pretty
darn good. It is always the mailing list mail which has the highest
potential for false positives. It has been a couple of years since I
ever found an email directly to me that was flagged as spam and it was
easy to find. A friend was using a pretty sketchy/broken email provider.
My server was well within rights to flag it as spam. Of all of the
mailing lists that I am on SDCS-Interest seems to be sending me the most
spam. List mail which really is spam I just leave in the spam folder and
delete it.
So I move the above 3 emails into their correct mailing list folders and
move on to the rest of the mail. It is all tagged with {Spam? <score>}
where <score> is between 10 and 100 or more. Usually they are in the
10-30 range. That stuff is all very spammy and I don't believe I have
ever seen a false positive in that range. It looks like all of the spams
with scores between 5 and 10 aren't getting {Spam? <score>} in the
subject. I don't recall if SpamAssassin does that by default or if I did
that for easy sorting for false positives. But either way it seems to
work out well for me.
I have about 50 on screen at once and page through them at maybe a
second per page. Then I highlight the whole range of emails and delete
them.
In my .procmailrc I sort for spam first:
:0
* ^X-copilot-MailScanner-SpamCheck: spam
.Junk_`/bin/date +%m%y`/
Then I sort for mailing list:
:0
* ^List-Id: Main Discussion List for KPLUG <kplug-list.kernel-panic.org>
.lists.kplug_`/bin/date +%m%y`/
So spam gets filtered from my list folders at the cost of a few false
positives on mailing list mail. I don't really care if I miss the
occasional mailing list mail and hate to end up archiving spam. I could
sort it the other way around and not have any false positives. But for
mailing list mail I prefer false positives over false negatives.
So that's how I scan for false positives in my junk folder.
--
Tracy R Reed Read my blog at http://ultraviolet.org
Key fingerprint = D4A8 4860 535C ABF8 BA97 25A6 F4F2 1829 9615 02AD
Non-GPG signed mail gets read only if I can find it among the spam.
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list