So the marketing guy says: There is a 0.1% "false positive" rate, meaning for a total of 1000 messages we handled, we erroneously rejected one 1 legit message as spam.

Their marketing guy can say anything.


However, someone with knowledge of statistics wants to know two things: The false positive ratio, and the false negative ratio. Say you receive 200 E-mails, 100 are really spam, and 100 are really legitimate E-mail. If the anti-spam program catches 95 of the 100 spams, and 2 of the legitimate E-mails, it has a 2% false positive ratio and a 5% false negative ratio.

Any other numbers aren't needed (typically they are just derived from the above numbers). Most marketing guys just say "It catches 95% of spam" (where 95% is just 1 minus the false negative ratio). This is half of the information needed to make a comparison (IE 95% is great if 0% of legitimate mail gets caught, but horrible if 50% of legitimate mail gets caught).

That "false positive rate" may be the conventional, professional number but it's a useless number because it does not say how accurate our reject/accept decisions were. It's marketing fluff.

It's not fluff, but you're right that isn't a useful number by itself. It's half of the useful numbers. The FP ratio plus the FN ratio are necessary, and they are only providing one of the two.


1. I reject 1000 messages as spam (true positives), but 5 of them were falsely rejected (were legit). So that's a 0.5% false rejects. aka false positives. ie, I screwed up on 0.5% of my rejects.

Wrong. Very close, but wrong.


2. I accept 1000 messages as legit (non-spam), but 8 of them are really spam. So that's 0.8% "false negatives", ie, false non-spam.

Wrong. Very close, but wrong.


Here, you have 997 (5 + 1000 - 8) legitimate E-mails and 1003 spams (1000 - 5 + 8).

In this case, the False Positive Ratio is 5/997 (number of legit caught as spam divided by all legit) = .502%. The False Negative ratio is 8/1003 = .798%.

It seems like nit-picking in this case. But if you study statistics, the difference is very important. For example, if you rejected 100 messages as spam (5 really legit), and accepted 900 as legit (but 8 were really spam), your numbers would be FP=5% FN=.9%, the correct ones would be FP=.6% FN=4.8%. That's very, very different. You would say "I catch 99.1% of spam" (1-FN); the correct answer is "I catch 95.2% of spam".

[If someone who is familiar with statistics sees any errors here, please speak up, I may have overlooked something).

-Scott
---
Declude JunkMail: The advanced anti-spam solution for IMail mailservers.
Declude Virus: Catches known viruses and is the leader in mailserver vulnerability detection.
Find out what you have been missing: Ask for a free 30-day evaluation.


---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]


To Unsubscribe: http://www.ipswitch.com/support/mailing-lists.html List Archive: http://www.mail-archive.com/imail_forum%40list.ipswitch.com/ Knowledge Base/FAQ: http://www.ipswitch.com/support/IMail/

Reply via email to