James G. Sack (jim) wrote:
Just curious: What do you actually do to look for false positives?

That's a good question. There are a few ways to look for false positives. The first thing I do is sort my spam folder by subject. For some reason some of my spam gets flagged as spam in the header but doesn't get the spam tag added to the subject. So they usually appear first in the list because everything else begins with {Spam? <score>} and these tend to have the higher false positive rate. This is where a lot of the sketchy but legit mailing list false positives end up. I scan through those looking for things like [listname] in the subject or names for people I know. I flip through it a page at a time and familiar people or list names just jump out at me.

I have 837 emails since I cleaned out the folder last night. That jives with about a thousand a day. And that thousand are the ones that make it past my greylisting. My logs suggest easily another thousand a day get turned away by greylisting. So I really get around 2000 spams a day.

So I'm cleaning out my spam folder right now just to give you guys some real examples. Of the emails that weren't tagged with {Spam? <score>} in the subject I have 3 false positives from mailing lists. Here are the subject lines and what SpamAssassin had to say about them:

Subject: Re: [asterisk-biz] Connecticut DID needed
X-copilot-MailScanner-SpamCheck: spam, ORDB-RBL, SpamAssassin (score=7.623,
        required 5, BAYES_00 -2.60, DATE_IN_FUTURE_06_12 1.67,
        HTML_MESSAGE 0.00, HTML_TITLE_EMPTY 0.21, RCVD_IN_DSBL 2.60,
        RCVD_IN_PBL 0.00, RCVD_IN_XBL 3.90, URI_SCHEME_MIXED_CASE 1.84)

Subject: Re[2]: [Haskell-cafe] Parallel weirdness [new insights]
X-copilot-MailScanner-SpamCheck: spam, ORDB-RBL, SpamAssassin (score=5.973,
        required 5, BAYES_00 -2.60, RCVD_IN_BL_SPAMCOP_NET 1.56,
        RCVD_IN_PBL 0.00, RCVD_IN_SORBS_WEB 1.46, RCVD_IN_XBL 3.90,
        SARE_STILLSINGLE 1.66)

Subject: [PATCH] x86: remove NexGen support
X-copilot-MailScanner-SpamCheck: spam, ORDB-RBL, SpamAssassin (score=5.192,
        required 5, BAYES_00 -2.60, FORGED_RCVD_HELO 0.14,
        RCVD_IN_BL_SPAMCOP_NET 1.56, RCVD_IN_PBL 0.00,
        RCVD_IN_SORBS_DUL 2.05, RCVD_IN_XBL 3.90, TW_JN 0.08, TW_OV 0.08)

The 7.623 score was pretty spamming but the 5.973 and 5.192 scores just barely slipped by as spam. Given that I get a couple thousand legit mails per day with all of my mailing list traffic I think that is pretty darn good. It is always the mailing list mail which has the highest potential for false positives. It has been a couple of years since I ever found an email directly to me that was flagged as spam and it was easy to find. A friend was using a pretty sketchy/broken email provider. My server was well within rights to flag it as spam. Of all of the mailing lists that I am on SDCS-Interest seems to be sending me the most spam. List mail which really is spam I just leave in the spam folder and delete it.

So I move the above 3 emails into their correct mailing list folders and move on to the rest of the mail. It is all tagged with {Spam? <score>} where <score> is between 10 and 100 or more. Usually they are in the 10-30 range. That stuff is all very spammy and I don't believe I have ever seen a false positive in that range. It looks like all of the spams with scores between 5 and 10 aren't getting {Spam? <score>} in the subject. I don't recall if SpamAssassin does that by default or if I did that for easy sorting for false positives. But either way it seems to work out well for me.

I have about 50 on screen at once and page through them at maybe a second per page. Then I highlight the whole range of emails and delete them.

In my .procmailrc I sort for spam first:

:0
* ^X-copilot-MailScanner-SpamCheck: spam
.Junk_`/bin/date +%m%y`/

Then I sort for mailing list:

:0
* ^List-Id: Main Discussion List for KPLUG <kplug-list.kernel-panic.org>
.lists.kplug_`/bin/date +%m%y`/

So spam gets filtered from my list folders at the cost of a few false positives on mailing list mail. I don't really care if I miss the occasional mailing list mail and hate to end up archiving spam. I could sort it the other way around and not have any false positives. But for mailing list mail I prefer false positives over false negatives.

So that's how I scan for false positives in my junk folder.

--
Tracy R Reed                  Read my blog at http://ultraviolet.org
Key fingerprint = D4A8 4860 535C ABF8 BA97  25A6 F4F2 1829 9615 02AD
Non-GPG signed mail gets read only if I can find it among the spam.


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to