On 11/30/06, Mark Waser <[EMAIL PROTECTED]> wrote:
    With many SVD systems, however, the representation is more vector-like
and *not* conducive to easy translation to human terms.  I have two answers
to these cases.  Answer 1 is that it is still easy for a human to look at
the closest matches to a particular word pair and figure out what they have
in common.

I developed an intrusion-detection system for detecting brand new
attacks on computer systems.  It takes TCP connections, and produces
100-500 statistics on each connection.  It takes thousands of
connections, and runs these statistics thru PCA to come up with 5
dimensions.  Then it clusters each connection, and comes up with 1-3
clusters per port that have a lot of connections and are declared to
be "normal" traffic.  Those connections that lie far from any of those
clusters are identified as possible intrusions.

The system worked much better than I expected it to, or than it had a
right to.  I went back and, by hand, tried to figure out how it was
classifying attacks.  In most cases, my conclusion was that there was
*no information available* to tell whether a connection was an attack,
because the only information to tell that a connection was an attack
was in the TCP packet contents, while my system looked only at packet
headers.  And yet, the system succeeded in placing about 50% of all
attacks in the top 1% of suspicious connections.  To this day, I don't
know how it did it.

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Reply via email to