On Tuesday, December 21, 2004, 4:49:33 AM, Markus wrote:

MG> First of all spam is anything
    
MG> comming from nonexistant, or forged senders
    
MG> having "hidden" content

MG> But what you're  asking for is the difference between our
MG> human brain and stupid computers (Pete,  your comment please ;-)

Well... I'm having fun lurking and I don't want to spoil that. I'm
anxious to learn what folks are thinking about all of this (without my
nudging).

The current implementation of Sniffer is a kind of broad spectrum
hybrid learning system. We use statistical models to try and keep the
core rulebase targeting what our users _seem_ to want filtered then we
customize individual rulebases to match specific preferences. The
learning model isn't perfect, but it has shown that by and large there
is a strong agreement for most folks about what should be filtered -
even if that definition cannot be clearly and consistently stated.

(Note I did not say "what is spam" because that is getting to be more
precise and more contentious these days.)

What I find (and it really stands out when working with Matt) is that
the definition indicated by the standing rules in our core rulebase
is a mixed bag of features and that the definition is highly fluid
around the edges.

For example, in large part Matt's rules would indicate traffic from
chtah is "not spam" but even he admits it's not acceptable to make
that definition hard (not ok to white-list chtah).

One more liberal definition of ham holds that if the recipient has a
first party relationship with the sender then any content from that
sender should not be filtered... Clearly from the volume of direct
advertising that is submitted to us as spam (even as recurring spam
problems) this definition does not hold for most of our users.

This "edge definition problem" was predicted and so far our model is
doing a reasonably good job of dealing with it - though improvements
are clearly needed and are on their way (albeit slowly).

In the mean time, end-user specific bayesian classification can often
solve the edge problem -- thus reinforcing that the fluidity at the
edge is largely due to differences in the filtering preferences of the
end users and the variability thereof.

Add to that the problem of data collection and the problem becomes not
only difficult to solve, but difficult to measure --- Imagine piloting
a supersonic fighter jet through a narrow winding canyon with your
eyes shut and you've just about got the picture.

As for the stupidity of machines... I personally believe that strong
intelligence can be built artificially (and in fact I do that for fun
and profit)... The big challenge with using AI for spam is the same as
for many AI systems where people's expectations are concerned: The AI
cannot and does not have a human frame of reference and so even if it
did match or exceed the innate intelligence of a human counterpart, it
would not be in a position to predict or model human behaviors
precisely.

Said another way (partly tongue in cheek) - since computers don't have
sex, they don't grok porn and (ahem) organ enhancement spam.

Without a social frame of reference they are reduced to guessing at
otherwise meaningless patterns. You or I could do no better in that
world.

So, what we do with the design of Sniffer is to build a highly
integrated hybrid with both human and machine components. Each gives
the other strong leverage where it's needed. The machines remember
better than we do, find and learn patterns well, and manage large
datasets without too much effort. The humans understand the social
contexts, predict and decode the strategies that are used by spammers,
and interpret the needs and desires of our customers.

I think I might be rambling...

Were these the kinds of comments you were looking for?

_M



---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to