Re: DNSWL and JMF White false positives, what to do exactly?
On Sat, 03 Oct 2009 00:12:37 +0200 mouss wrote: > RW wrote: > > On Fri, 02 Oct 2009 00:14:52 +0200 > > mouss wrote: > > > > The source of your confusion is that you are mixing-up the > > terminology of the overall classification and individual test > > results. Think of this way, in a fingerprint comparison the > > meanings of TP, TN, FP and FN are obvious and intrinsic to the > > test, it would be absurd to switch them around depending on whether > > it's evidence for the defence or prosecution. > > let's take it more easily: Please explain to me what was an FP in this > thread. A test intended for identifying ham was being hit on spam. A hit on a rule is a positive result. When a rule hits something it's intended to identify, it's a "true positive". When a rule hits something it's not intended to identify, it's a "false positive", and so on. The same terminology can be used for SpamAssassin's overall spam classification, but that's a different matter. If you talk about a rule hit being an FN, because it might contribute to a classification FN then you are using the terminology like a cargo-cultist.
Re: DNSWL and JMF White false positives, what to do exactly?
On Sat, 2009-10-03 at 00:25 +0200, mouss wrote: > Karsten Bräckelmann wrote: > > > > False positive. Something, that matches (positive) the criterion for a > > > > certain test, but should not (false). > > > > I stand to what I said. > > I'm not surprised:) ;) > > IFF you are talking about the black box that spam detection is, that is > > true. > > > > If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to > > be that simple. However, it is not. You are looking at a single test, > > which -- if positive -- either is correct or wrong. > > I understand the rationale, but I find this too abstract for "common" > discussions. *shrug* You're not obliged to participate in a thread, if it is confusing to you. That's the wonders of open discussion and diverse input. You might stumble upon something you didn't know before... ;) > > Same for RCVD_IN_DNSWL. If it positively matches, it either it is > > correct, or wrong. A false positive is a match, that is wrong. No matter > > the score you assign the test. > > except that it depends what the test really means. dnswl doesn't mean > the listed hosts never send spam. I am happy that it lists debian list > servers, Orange, ... etc. Exactly, in the context of a single rule (as opposed to "detecting spam"), it depends on what the rule really means. Or in short, its score's sign... > > This concept is NOT specific to spam detection, or even computer > > science. As a matter of fact, when I first really grasped the concept, a > > medical scientist explained it to me. > > now that you say it, this is true. I too believ that medical science has > precedence in this area. > > > Yes, a FP for a rule that identifies *ham* actually evaluated positive > > on a spam. It only appears to be spam centric on this list, cause it is > > mainly dedicated to identifying spam, not ham. > > > > You might want to ask wikipedia as well. And don't focus on the spam > > filtering *example*, which again exclusively talks about a rule > > identifying spam. Not ham. > > my point was that in a spam oriented forum, the meaning of some words is > what "most of us" (yes, this is hard to define) think they mean. the > principle of least astonishment. Of course, these terms mostly come up WRT to overall score of a message, which applies to "detecting spam". However, on this very list, it also commonly is referred to single rules FP'ing, *without* pushing the ham above the required_score threshold. The only aspect new and obviously confusing to some regulars on this list is the negative sign of the rule's score. Inverting the "is spam" test logic also inverts the meaning of F[PN]. Whether one likes this or not. It's all about context. And FWIW, it is wrong to base your definitions on what the majority thinks is correct. The majority and what's believed to be "common knowledge" too often is wrong. You can observe this in real life, too... I prefer to educate the masses instead. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: DNSWL and JMF White false positives, what to do exactly?
Karsten Bräckelmann wrote: > On Fri, 2009-10-02 at 00:08 +0200, mouss wrote: >> Karsten Bräckelmann wrote: >>> False positive. Something, that matches (positive) the criterion for a >>> certain test, but should not (false). > > I stand to what I said. > I'm not surprised:) >> you can certainly devise a system to detect alpha(foo) where alpha is a >> function mapping a Banach space to a Hilbert Space, and define what FP, >> FN, FX mean in the context you consider. you can also say "let PI=69, >> ... ". but conventions are here for a reason. they allow us to >> understand each others more easily. the fact that children of today can >> solve computation problems that "great scientists" of the old times >> couldn't handle is thanks to conventions (think of a/b * c/d = >> (a*c)/(b*d), which looks trivial today, but wasn't before). >> >> when talking about spam or intrusion detection, FN means "missing" and >> FP means "false alarm". if we allow defining FN and FP differently, then >> we'll need to rewrite a lot of books, reports, articles, ... > > IFF you are talking about the black box that spam detection is, that is > true. > > If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to > be that simple. However, it is not. You are looking at a single test, > which -- if positive -- either is correct or wrong. > I understand the rationale, but I find this too abstract for "common" discussions. > Same for RCVD_IN_DNSWL. If it positively matches, it either it is > correct, or wrong. A false positive is a match, that is wrong. No matter > the score you assign the test. > except that it depends what the test really means. dnswl doesn't mean the listed hosts never send spam. I am happy that it lists debian list servers, Orange, ... etc. > > This concept is NOT specific to spam detection, or even computer > science. As a matter of fact, when I first really grasped the concept, a > medical scientist explained it to me. > now that you say it, this is true. I too believ that medical science has precedence in this area. > Yes, a FP for a rule that identifies *ham* actually evaluated positive > on a spam. It only appears to be spam centric on this list, cause it is > mainly dedicated to identifying spam, not ham. > > You might want to ask wikipedia as well. And don't focus on the spam > filtering *example*, which again exclusively talks about a rule > identifying spam. Not ham. > my point was that in a spam oriented forum, the meaning of some words is what "most of us" (yes, this is hard to define) think they mean. the principle of least astonishment. anyway, I'm sorry for bringing the discussion to this sand. so I will stop here (of course, offlist is ok for any discussion, including garbage without collection:)
Re: DNSWL and JMF White false positives, what to do exactly?
RW wrote: > On Fri, 02 Oct 2009 00:14:52 +0200 > mouss wrote: > >> RW wrote: > >>> The term false-positive can apply to any test. A test for ham >>> that matches a spam is a false-positive, it's a matter of context. >> spam too can be (re)defined. and actually any term. but it is assumed >> here that we talk about spam detection. so false negative means "miss" >> and false positive means "false alarm". this is the common terminology >> inherited from intrusion detection. > > The term comes from statistics, not intrusion detection. I don't > know much about the latter, perhaps people in that field are a little > sloppy in their usage, more likely all the tests are expressed as > tests for intrusion, so the same kind of issue doesn't arise. > > The source of your confusion is that you are mixing-up the terminology > of the overall classification and individual test results. Think of > this way, in a fingerprint comparison the meanings of TP, TN, FP and FN > are obvious and intrinsic to the test, it would be absurd to switch > them around depending on whether it's evidence for the defence or > prosecution. let's take it more easily: Please explain to me what was an FP in this thread.
Re: DNSWL and JMF White false positives, what to do exactly?
Charles Gregory wrote: On Fri, 2 Oct 2009, RW wrote: However, if you want to be understood you need to speak the Lingua Franca. If you choose to use a term differently than everyone else you WILL be misunderstood and corrected. If everyone calls an apple an orange, then yeah, it's an orange. A false match on a test is a false-positive. It doesn't reverse for a ham test, simply because you're more used to thinking about spam tests. The distinction is whether the 'false positive' refers to the overall scoring of the message (FP=ham flagged as spam) or an individual test (FP=test triggered incorrectly). I consider *both* usages correct in this group. And as I vaguely recall, the OP did use sufficient context for even a lame-brain like myself to realize he meant the latter. The FP on the named rule had the potential to cause an FN. Do you apply the same usage to anything else? For example, do you reverse the meaning of "off" and "on" for air-conditioning to make it consistent with heating, so "on" always mean "make hotter"? Do you TURN UP or TURN DOWN your air-conditioning? Depends on whether someone has a simple numerical control or is adjusting a thermostat. Plus colloquial usage, of course. :) But yeah, you hit pretty close with your analogy. Just chose the wrong words. :) - Charles Q. Do I make a left at the next intersection? A. Right!
Re: DNSWL and JMF White false positives, what to do exactly?
On Fri, 2 Oct 2009, RW wrote: However, if you want to be understood you need to speak the Lingua Franca. If you choose to use a term differently than everyone else you WILL be misunderstood and corrected. If everyone calls an apple an orange, then yeah, it's an orange. A false match on a test is a false-positive. It doesn't reverse for a ham test, simply because you're more used to thinking about spam tests. The distinction is whether the 'false positive' refers to the overall scoring of the message (FP=ham flagged as spam) or an individual test (FP=test triggered incorrectly). I consider *both* usages correct in this group. And as I vaguely recall, the OP did use sufficient context for even a lame-brain like myself to realize he meant the latter. The FP on the named rule had the potential to cause an FN. Do you apply the same usage to anything else? For example, do you reverse the meaning of "off" and "on" for air-conditioning to make it consistent with heating, so "on" always mean "make hotter"? Do you TURN UP or TURN DOWN your air-conditioning? Depends on whether someone has a simple numerical control or is adjusting a thermostat. Plus colloquial usage, of course. :) But yeah, you hit pretty close with your analogy. Just chose the wrong words. :) - Charles
Re: DNSWL and JMF White false positives, what to do exactly?
On Thu, 1 Oct 2009 18:54:40 -0600 LuKreme wrote: > On Oct 1, 2009, at 18:36, Karsten Bräckelmann > wrote: > > > Same for RCVD_IN_DNSWL. If it positively matches, it either it is > > correct, or wrong. A false positive is a match, that is wrong. No > > matter > > the score you assign the test. > > Lke others havecsaid, you can make the words mean whatever you want. > However, if you want to be understood you need to speak the Lingua > Franca. If you choose to use a term differently than everyone else > you WILL be misunderstood and corrected. Except that so far the lunatics haven't taken-over the asylum and you are in a 3 to 2 minority, so please don't claim to be speaking for everyone. A false match on a test is a false-positive. It doesn't reverse for a ham test, simply because you're more used to thinking about spam tests. Do you apply the same usage to anything else? For example, do you reverse the meaning of "off" and "on" for air-conditioning to make it consistent with heating, so "on" always mean "make hotter"?
Re: DNSWL and JMF White false positives, what to do exactly?
On Fri, 02 Oct 2009 00:14:52 +0200 mouss wrote: > RW wrote: > > The term false-positive can apply to any test. A test for ham > > that matches a spam is a false-positive, it's a matter of context. > > spam too can be (re)defined. and actually any term. but it is assumed > here that we talk about spam detection. so false negative means "miss" > and false positive means "false alarm". this is the common terminology > inherited from intrusion detection. The term comes from statistics, not intrusion detection. I don't know much about the latter, perhaps people in that field are a little sloppy in their usage, more likely all the tests are expressed as tests for intrusion, so the same kind of issue doesn't arise. The source of your confusion is that you are mixing-up the terminology of the overall classification and individual test results. Think of this way, in a fingerprint comparison the meanings of TP, TN, FP and FN are obvious and intrinsic to the test, it would be absurd to switch them around depending on whether it's evidence for the defence or prosecution.
Re: DNSWL and JMF White false positives, what to do exactly?
On Oct 1, 2009, at 18:36, Karsten Bräckelmann wrote: Same for RCVD_IN_DNSWL. If it positively matches, it either it is correct, or wrong. A false positive is a match, that is wrong. No matter the score you assign the test. Lke others havecsaid, you can make the words mean whatever you want. However, if you want to be understood you need to speak the Lingua Franca. If you choose to use a term differently than everyone else you WILL be misunderstood and corrected. Saying everyone else is wrong isn't going to help.
Re: DNSWL and JMF White false positives, what to do exactly?
On Fri, 2009-10-02 at 00:08 +0200, mouss wrote: > Karsten Bräckelmann wrote: > > False positive. Something, that matches (positive) the criterion for a > > certain test, but should not (false). I stand to what I said. > you can certainly devise a system to detect alpha(foo) where alpha is a > function mapping a Banach space to a Hilbert Space, and define what FP, > FN, FX mean in the context you consider. you can also say "let PI=69, > ... ". but conventions are here for a reason. they allow us to > understand each others more easily. the fact that children of today can > solve computation problems that "great scientists" of the old times > couldn't handle is thanks to conventions (think of a/b * c/d = > (a*c)/(b*d), which looks trivial today, but wasn't before). > > when talking about spam or intrusion detection, FN means "missing" and > FP means "false alarm". if we allow defining FN and FP differently, then > we'll need to rewrite a lot of books, reports, articles, ... IFF you are talking about the black box that spam detection is, that is true. If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to be that simple. However, it is not. You are looking at a single test, which -- if positive -- either is correct or wrong. Same for RCVD_IN_DNSWL. If it positively matches, it either it is correct, or wrong. A false positive is a match, that is wrong. No matter the score you assign the test. This concept is NOT specific to spam detection, or even computer science. As a matter of fact, when I first really grasped the concept, a medical scientist explained it to me. Yes, a FP for a rule that identifies *ham* actually evaluated positive on a spam. It only appears to be spam centric on this list, cause it is mainly dedicated to identifying spam, not ham. You might want to ask wikipedia as well. And don't focus on the spam filtering *example*, which again exclusively talks about a rule identifying spam. Not ham. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: DNSWL and JMF White false positives, what to do exactly?
RW wrote: > On Wed, 30 Sep 2009 23:35:31 +0200 > mouss wrote: > >> Warren Togami wrote: >>> I scanned my spam folders and found a few false positives that hit >>> on either DNSWL >> FP with DNSWL? >> >> FP = False Positive = legitimaite mail tagged as spam >> DNSWL = Whitelist > > The term false-positive can apply to any test. A test for ham > that matches a spam is a false-positive, it's a matter of context. spam too can be (re)defined. and actually any term. but it is assumed here that we talk about spam detection. so false negative means "miss" and false positive means "false alarm". this is the common terminology inherited from intrusion detection. I used to have a clock that was anti-clockwise. but it was for fun. I always understood what "clockwise" meant.
Re: DNSWL and JMF White false positives, what to do exactly?
Karsten Bräckelmann wrote: > On Wed, 2009-09-30 at 23:35 +0200, mouss wrote: >> Warren Togami wrote: >>> I scanned my spam folders and found a few false positives that hit on >>> either DNSWL >> FP with DNSWL? >> >> FP = False Positive = legitimaite mail tagged as spam >> DNSWL = Whitelist > > False positive. Something, that matches (positive) the criterion for a > certain test, but should not (false). > >> if your system adds points because of dnswl, you have a serious problem. .. >> >> or do you mean FN (false negative)? > > Granted, the wording ("FPs that hit ham rules") could need some polish, > but I believe Warren was talking about spam that falsely hits ham rules. > > you can certainly devise a system to detect alpha(foo) where alpha is a function mapping a Banach space to a Hilbert Space, and define what FP, FN, FX mean in the context you consider. you can also say "let PI=69, ... ". but conventions are here for a reason. they allow us to understand each others more easily. the fact that children of today can solve computation problems that "great scientists" of the old times couldn't handle is thanks to conventions (think of a/b * c/d = (a*c)/(b*d), which looks trivial today, but wasn't before). when talking about spam or intrusion detection, FN means "missing" and FP means "false alarm". if we allow defining FN and FP differently, then we'll need to rewrite a lot of books, reports, articles, ...
Re: DNSWL and JMF White false positives, what to do exactly?
On Wed, 2009-09-30 at 23:35 +0200, mouss wrote: > Warren Togami wrote: > > I scanned my spam folders and found a few false positives that hit on > > either DNSWL > > FP with DNSWL? > > FP = False Positive = legitimaite mail tagged as spam > DNSWL = Whitelist False positive. Something, that matches (positive) the criterion for a certain test, but should not (false). > if your system adds points because of dnswl, you have a serious problem. .. > > or do you mean FN (false negative)? Granted, the wording ("FPs that hit ham rules") could need some polish, but I believe Warren was talking about spam that falsely hits ham rules. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: DNSWL and JMF White false positives, what to do exactly?
On Wed, 30 Sep 2009 23:35:31 +0200 mouss wrote: > Warren Togami wrote: > > I scanned my spam folders and found a few false positives that hit > > on either DNSWL > > FP with DNSWL? > > FP = False Positive = legitimaite mail tagged as spam > DNSWL = Whitelist The term false-positive can apply to any test. A test for ham that matches a spam is a false-positive, it's a matter of context.
Re: DNSWL and JMF White false positives, what to do exactly?
On Wed, Sep 30, 2009 at 11:35:31PM +0200, mouss wrote: > > yes, you can report offending IPs, if that makes sense. for example, if > the offending IP is that of an ISP relay, then don't report it: ISPs do > relay spam. Ehm.. surely you should report spam sending ISP relays if they are miscategorized as low or higher.
Re: DNSWL and JMF White false positives, what to do exactly?
Warren Togami wrote: > I scanned my spam folders and found a few false positives that hit on > either DNSWL FP with DNSWL? FP = False Positive = legitimaite mail tagged as spam DNSWL = Whitelist if your system adds points because of dnswl, you have a serious problem. .. or do you mean FN (false negative)? > or JMF (HOSTKARMA? See how confusing it is not knowing > what to call it?) > > Is there an easy automated way we can forward FP's to DNSWL and JMF so > their maintainers can decide what to do about the offending senders? offending? then you probably mean FN. yes, you can report offending IPs, if that makes sense. for example, if the offending IP is that of an ISP relay, then don't report it: ISPs do relay spam. if on the other hand you see FNs from paypal or bank of blahblah, then do submit. > I'd > attach it to mail but it might get caught in the spam filter... > post the s(p)ample on a web site instead. you can use pastebin for example.