Philip Goetz gave an example of an intrusion detection system that learned
information that was not comprehensible to humans. You argued that he could
have understood it if he tried harder.

No, I gave five separate alternatives most of which put the blame on the system for not being able to compress it's data pattern into knowledge and explain it to Philip. As I keep saying (and am trying to better rephrase here), the problem with statistical and similar systems is that they generally don't pick out and isolate salient features (unless you are lucky enough to have constrained them to exactly the correct number of variables). Since they don't pick out and isolate features, they are not able to build upon what they do.

I disagreed and argued that an
explanation would be useless even if it could be understood.

In your explanation, however, you basically *did* explain exactly what the system did. Clearly, the intrusion detection system looks at a number of variables and if the weighted sum exceeds a threshold, it decides that it is likely an intruder. The only real question is the degree of entanglement of the variables in the real world. It is *possible*, though I would argue extremely unlikely, that the variables really are entangled enough in the real world that a human being couldn't be trained to do intrusion detection. It is much, much, *MUCH* more probable that the system has improperly entangled the variables because it has too many degrees of freedom.

If you use a computer to add up a billion numbers, do you check the math, or
do you trust it to give you the right answer?

I trust it to give me the right answer because I know and understand exactly what it is doing.

My point is that when AGI is built, you will have to trust its answers based
on the correctness of the learning algorithms, and not by examining the
internal data or tracing the reasoning.

The problems are that 1) correct learning algorithms will give bad results if given bad data *and* 2) how are you ensuring that your learning algorithms are correct under all of the circumstances that you're using them?

I believe this is the fundamental
flaw of all AI systems based on structured knowledge representations, such as
first order logic, frames, connectionist systems, term logic, rule based
systems, and so on.  The evidence supporting my assertion is:
1. The relative success of statistical models vs. structured knowledge.

Statistical models are successful at pattern-matching and recognition. I am not aware of *anything* else that they are successful at. I am fully aware of Jeff Hawkins' contention that pattern-matching is the only thing that the brain does but I would argue that that pattern-matching includes features extraction and knowledge compression, that current statistical AI models do not, and that that is why current statistical models are anything but AI.

Straight statistical models like you are touting are never going to get you to AI until you can successfully build them on top of each other -- and to do that, you need feature extraction and thus explainability. An AGI is certainly going to use statistics for feature extraction, etc. but knowledge is *NOT* going to be kept in raw, badly entangled statistical form (i.e. basically compressed data rather than knowledge). If you were to add functionality to a statistical system such that it could extract features and use that to explain it's results, then I would say that it is on the way to AGI. The point is that your statistical systems can't correctly explain their results even to an unlimited being (because most of the time they are incorrectly entangled anyways).


----- Original Message ----- From: "Matt Mahoney" <[EMAIL PROTECTED]>
To: <agi@v2.listbox.com>
Sent: Sunday, December 03, 2006 11:11 PM
Subject: Re: [agi] A question on the symbol-system hypothesis


Mark,

Philip Goetz gave an example of an intrusion detection system that learned
information that was not comprehensible to humans. You argued that he could
have understood it if he tried harder.  I disagreed and argued that an
explanation would be useless even if it could be understood.

If you use a computer to add up a billion numbers, do you check the math, or
do you trust it to give you the right answer?

My point is that when AGI is built, you will have to trust its answers based
on the correctness of the learning algorithms, and not by examining the
internal data or tracing the reasoning.  I believe this is the fundamental
flaw of all AI systems based on structured knowledge representations, such as
first order logic, frames, connectionist systems, term logic, rule based
systems, and so on.  The evidence supporting my assertion is:

1. The relative success of statistical models vs. structured knowledge.
2. Arguments based on algorithmic complexity. (The brain cannot model a more
complex machine).
3. The two examples above.

I'm afraid that's all the arguments I have.  Until we build AGI, we really
won't know. I realize I am repeating (summarizing) what I have said before.
If you want to tear down my argument line by line, please do it privately
because I don't think the rest of the list will be interested.

--- Mark Waser <[EMAIL PROTECTED]> wrote:

Matt,

    Why don't you try addressing my points instead of simply repeating
things that I acknowledged and answered and then trotting out tired old red
herrings.

As I said, your network intrusion anomaly detector is a pattern matcher.

It is a stupid pattern matcher that can't explain it's reasoning and can't
build upon what it has learned.

    You, on the other hand, gave a very good explanation of how it works.
Thus, you have successfully proved that you are an explaining intelligence
and it is not.

If anything, you've further proved my point that an AGI is going to have

to be able to explain/be explained.


----- Original Message ----- From: "Matt Mahoney" <[EMAIL PROTECTED]>
To: <agi@v2.listbox.com>
Sent: Saturday, December 02, 2006 5:17 PM
Subject: Re: [agi] A question on the symbol-system hypothesis


>
> --- Mark Waser <[EMAIL PROTECTED]> wrote:
>
>> A nice story but it proves absolutely nothing . . . . .
>
> I know a little about network intrusion anomaly detection (it was my
> dissertation topic), and yes it is an important lessson.
>
> Network traffic containing attacks has a higher algorithmic complexity
> than
> traffic without attacks. It is less compressible. The reason has > nothing

> to
> do with the attacks, but with arbitrary variations in protocol usage > made
> by
> the attacker.  For example, the Code Red worm fragments the TCP stream
> after
> the HTTP "GET" command, making it detectable even before the buffer
> overflow
> code is sent in the next packet. A statistical model will learn that > this

> is
> unusual (even though legal) in normal HTTP traffic, but offer no
> explanation
> why such an event should be hostile. The reason such anomalies occur > is > because when attackers craft exploits, they follow enough of the > protocol
> to
> make it work but often don't care about the undocumented conventions
> followed
> by normal servers and clients.  For example, they may use lower case
> commands
> where most software uses upper case, or they may put unusual but legal
> values
> in the TCP or IP-ID fields or a hundred other things that make the > attack
> stand out.  Even if they are careful, many exploits require unusual
> commands
> or combinations of options that rarely appear in normal traffic and are
> therefore less carefully tested.
>
> So my point is that it is pointless to try to make an anomaly detection
> system
> explain its reasoning, because the only explanation is that the traffic > is
> unusual.  The best you can do is have it estimate the probability of a
> false
> alarm based on the information content.
>
> So the lesson is that AGI is not the only intelligent system where you
> should
> not waste your time trying to understand what it has learned. Even if > you > understood it, it would not tell you anything. Would you understand > why a > person made some decision if you knew the complete state of every > neuron
> and
> synapse in his brain?
>
>
>> You developed a pattern-matcher. The pattern matcher worked (and I >> would >> dispute that it worked better "than it had a right to"). Clearly, you >> do
>> not understand how it worked.  So what does that prove?
>>
>> Your contention (or, at least, the only one that continues the >> previous
>> thread) seems to be that you are too stupid to ever understand the
>> pattern
>> that it found.
>>
>> Let me offer you several alternatives:
>> 1)  You missed something obvious
>> 2) You would have understood it if the system could have explained it >> to
>> you
>> 3) You would have understood it if the system had managed to >> losslessly
>> convert it into a more compact (and comprehensible) format
>> 4) You would have understood it if the system had managed to >> losslessly >> convert it into a more compact (and comprehensible) format and >> explained
>> it
>> to your
>> 5)  You would have understood it if the system had managed to lossily
>> convert it into a more compact (and comprehensible -- and probably >> even,
>> more correct) format
>> 6)  You would have understood it if the system had managed to lossily
>> convert it into a more compact (and comprehensible -- and probably >> even,
>> more correct) format and explained it to you
>>
>> My contention is that the pattern that it found was simply not >> translated
>> into terms you could understand and/or explained.
>>
>> Further, and more importantly, the pattern matcher *doesn't* >> understand
>> it's
>>
>> results either and certainly could build upon them -- thus, it *fails*
>> the
>> test as far as being the central component of an RSIAI or being able >> to
>> provide evidence as to the required behavior of such.
>>
>> ----- Original Message ----- >> From: "Philip Goetz" <[EMAIL PROTECTED]>
>> To: <agi@v2.listbox.com>
>> Sent: Friday, December 01, 2006 7:02 PM
>> Subject: Re: [agi] A question on the symbol-system hypothesis
>>
>>
>> > On 11/30/06, Mark Waser <[EMAIL PROTECTED]> wrote:
>> >>     With many SVD systems, however, the representation is more
>> >> vector-like
>> >> and *not* conducive to easy translation to human terms.  I have two
>> >> answers
>> >> to these cases. Answer 1 is that it is still easy for a human to >> >> look

>> >> at
>> >> the closest matches to a particular word pair and figure out what >> >> they
>> >> have
>> >> in common.
>> >
>> > I developed an intrusion-detection system for detecting brand new
>> > attacks on computer systems.  It takes TCP connections, and produces
>> > 100-500 statistics on each connection.  It takes thousands of
>> > connections, and runs these statistics thru PCA to come up with 5
>> > dimensions.  Then it clusters each connection, and comes up with 1-3
>> > clusters per port that have a lot of connections and are declared to
>> > be "normal" traffic. Those connections that lie far from any of >> > those
>> > clusters are identified as possible intrusions.
>> >
>> > The system worked much better than I expected it to, or than it had >> > a
>> > right to.  I went back and, by hand, tried to figure out how it was
>> > classifying attacks. In most cases, my conclusion was that there >> > was >> > *no information available* to tell whether a connection was an >> > attack,
>> > because the only information to tell that a connection was an attack
>> > was in the TCP packet contents, while my system looked only at >> > packet
>> > headers.  And yet, the system succeeded in placing about 50% of all
>> > attacks in the top 1% of suspicious connections. To this day, I >> > don't
>> > know how it did it.
>> >
>> > -----
>> > This list is sponsored by AGIRI: http://www.agiri.org/email
>> > To unsubscribe or change your options, please go to:
>> > http://v2.listbox.com/member/?list_id=303
>> >
>>
>> -----
>> This list is sponsored by AGIRI: http://www.agiri.org/email
>> To unsubscribe or change your options, please go to:
>> http://v2.listbox.com/member/?list_id=303
>>
>
>
> -- Matt Mahoney, [EMAIL PROTECTED]
>
> -----
> This list is sponsored by AGIRI: http://www.agiri.org/email
> To unsubscribe or change your options, please go to:
> http://v2.listbox.com/member/?list_id=303
>

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Reply via email to