On 11/30/06, Mark Waser <[EMAIL PROTECTED]> wrote:
With many SVD systems, however, the representation is more vector-like and *not* conducive to easy translation to human terms. I have two answers to these cases. Answer 1 is that it is still easy for a human to look at the closest matches to a particular word pair and figure out what they have in common.
I developed an intrusion-detection system for detecting brand new attacks on computer systems. It takes TCP connections, and produces 100-500 statistics on each connection. It takes thousands of connections, and runs these statistics thru PCA to come up with 5 dimensions. Then it clusters each connection, and comes up with 1-3 clusters per port that have a lot of connections and are declared to be "normal" traffic. Those connections that lie far from any of those clusters are identified as possible intrusions. The system worked much better than I expected it to, or than it had a right to. I went back and, by hand, tried to figure out how it was classifying attacks. In most cases, my conclusion was that there was *no information available* to tell whether a connection was an attack, because the only information to tell that a connection was an attack was in the TCP packet contents, while my system looked only at packet headers. And yet, the system succeeded in placing about 50% of all attacks in the top 1% of suspicious connections. To this day, I don't know how it did it. ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303