[mailop] New method of blocking spam

Marc Perkel Thu, 21 Jan 2016 12:57:06 -0800

Just to follow up on this. I'm in the process of improving the filter.But I have filed my provisional patent so i'm going to give you anoverview of how it works.

Most spam filters work by matching things. Matching ham and spam.Matching rules. The important point here in this is this new system I'mcalling the Evolution filter is about NOT matching.

Suppose I sent you an email with the subject line "Let's get dinner".You can tell instantly this is good email. How? Because spammers neversay "Let's get dinner".

There are millions of phrases used in good email every day that arenever used in spam. And - there are millions of phrases used everyday inspam that are never used in good email. So if I get an email thatmatches phrases used in good email and never used in spam - it's a goodmessage. And if the messages contains words and phrases used in spam andnever used in ham - it's spam.

So - how do I get a list of all phrases never used in ham or never usedin spam? I make a list of all words and phrases used in ham and spam andtest to see if it's NOT in the list. To illustrate my point,

Here is a list of 5505874 words and phrases used in the subject line ofHAM and never seen in the subject line of SPAM


http://www.junkemailfilter.com/data/subject-ham.txt

Here is a list of 3494938 words and phrases used in the subject line ofSPAM and never seen in the subject line of HAM


http://www.junkemailfilter.com/data/subject-spam.txt

The thing about not matching is that matching involves finite sets. Notmatching involves infinite sets. And infinite sets are always biggerthan finite sets.


Here in a link to my patent.

http://www.junkemailfilter.com/patent/

What I intend to do is to give it away to the little guys and charge thebig guys a small license fee. The process of implementing this is fairlyeasy. I'm hoping to encourage the open source world to take this ideaand do it right. My code it cobbled together and uses 4 differentlanguages. But the concept is enough to get you going.

One thing you will need to implement this is Redis. Redis is extremelyfast at set comparisons and set comparisons is how this works. It's canbe expressed as one formula.

score = card(SpamCorpus intersect TestMessage diff HamCorpus) -card(HamCorpus intersect TestMessage diff SpamCorpus)

I'm seeing an accuracy level that is so close to 100% it's scary. It isespecially good at actively identifying good email to prevent falsepositives.


I will post more soon as it all comes together.




_______________________________________________
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop

[mailop] New method of blocking spam

Reply via email to