I don't want to write about the respective merits of Declude JunkMail and
IMGate...

Good -- I don't want to either, and won't unless pushed into it. :) For the record, though, I do believe both are useful products, with significant differences, and most people would benefit from either one or both of them (depending on their needs).


... I contend that I still want to know what either solution claims in terms of FP ratios.

As you point out, this isn't an easy task (it's harder than determining the spam capture rate, which is actually complex). The main problem is getting a diverse set of legitimate E-mail. Also, it depends on your definition of false positives -- for example, every legitimate E-mail (whether or not it is marked as spam) gets read here (because of the business we are in, we must receive all E-mail, and not block any -- instead, we use Declude JunkMail to sort out the spam), so some people would (incorrectly, in my opinion) say we have 0% false positives.


Here, from a sample of over 5,000 legitimate E-mail taken during the entire month of July, 2003 (which includes a sampling of business E-mail, mailing lists, and personal E-mail) shows a 0.7% FP rate (based on our recommended WEIGHT20 test). Of the 37 legitimate E-mails incorrectly identified as spam, 16 came from the same mailing list, so if we used whitelisting (which cannot be used to come up with an accurate publicized FP ratio, in our opinion), that would have quickly been reduced the FP ratio to 0.4%. Or if we cheated and months ago unsubscribed from that mailing list to get better numbers, we would be at 0.4%.

Another one of those legitimate E-mails was actually UCE, but still desired (a local realtor that signed someone up to their list without permission). That leaves us with 20 non-UCE.

Of the 6 legitimate non-mailing-list E-mails sent to @declude.com addresses that were caught as spam, 5 were both unsolicited and undesirable (typically mailserver admins that see the list of spam databases at http://www.declude.com/junkmail/support/ip4r.htm and ask us to remove their IP, when in fact we run no spam databases!). The 6th was a request for an evaluation version of Declude Virus (which was likely not a potential customer, given the supplied information). Some people would call the first 5 spam, although we do not.

Our FP ratios are abnormally high, due to several factors. For example, we typically have dozens of spams forwarded to us in a given week, and those all qualify as legitimate E-mail (so if they get caught, our FP ratio goes up). And, because we keep one of the two master lists of spam databases, we get quite a few invalid "removal requests" from people listed in spam databases.

The real question becomes, "How can I determine what is the FP ratio during
tweaking?" I obviously cannot go with the raw number of rejected e-mails.
Perhaps the logs of the single-criterion/reject-at-the-envelope-level system
can provide me with information as to how many were rejected at the envelope
level, owing (for example) to the sending mail server being listed in some
RBL for *some* reason, but I don't see how to get an FP ratio out of this.

If you are rejecting mail, the only way to know your FP ratio is if you can tell from the return address (and a few other pieces of information, such as HELO/EHLO, IP, and time sent) whether an E-mail is legitimate or not. In our sample above, we would be able to tell that those mailing list E-mails were rejected based on their return address. But, in many cases, it is impossible to tell from the return address if an E-mail is legitimate or not.


What we do here is we have a database of all mail that arrived to @declude.com addresses, as well as several other domains to get a better legitimate E-mail sample. All E-mail is classified (by a human) as truly legitimate or spam. Then, to determine the FP ratio, we get the sample we want (such as all E-mail from July) and check to see how many have a header added by Declude JunkMail to indicate that it was spam.

Am I thus correct in saying that in order to know the FP ratio, I have to
accept and process the DATA segment (with the incurred overhead)?

Correct (in virtually all cases). That assumes that [1] A human can't identify spam based on the return address (and IP and EHLO/HELO) alone, and [2] You use the standard definition of false positive (not something like "E-mails that were rejected and a complaint was received").


More to the point, am I also correct that the aforementioned
"single-criterion/reject-at-the-envelope-level" solution *cannot* ever give
me *any* measurable FP ratio?

Correct.


-Scott
---
Declude JunkMail: The advanced anti-spam solution for IMail mailservers.
Declude Virus: Catches known viruses and is the leader in mailserver vulnerability detection.
Find out what you have been missing: Ask for a free 30-day evaluation.


---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]


To Unsubscribe: http://www.ipswitch.com/support/mailing-lists.html List Archive: http://www.mail-archive.com/imail_forum%40list.ipswitch.com/ Knowledge Base/FAQ: http://www.ipswitch.com/support/IMail/

Reply via email to