Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Richard Loosemore Wed, 30 Jan 2008 11:39:29 -0800

Kaj Sotala wrote:

On Jan 29, 2008 6:52 PM, Richard Loosemore <[EMAIL PROTECTED]> wrote:

Okay, sorry to hit you with incomprehensible technical detail, but maybe
there is a chance that my garbled version of the real picture will
strike a chord.


The message to take home from all of this is that:

1) There are *huge* differences between the way that a system would
behave if it had a single GS, or even a group of conflicting GS modules
(which is the way you interpreted my proposal, above) and the kind of
MES system I just described:  the difference would come from the type of
influence exerted, because the vector field is operating on a completely
different level than the symbl processing.

2) The effect of the MES is to bias the system, but this "bias" amounts
to the following system imperative:  [Make your goals consistent with
this *massive* set of constraints] .... where the "massive set of
constraints" is a set of ideas built up throughout the entire
development of the system.  Rephrasing that in terms of an example:  if
the system gets an idea that it should take a certain course of action
because it seems to satisfy an immediate goal, the implications of that
action will be quickly checked against a vast range o constraints, and
if there is any hint of an inconsistency with teh value system, this
will "pull" the thoughts of the AGI toward that issue, whereupon it will
start to elaborate the issue in more detail and try to impose an even
wider net of constraits, finally making a decision based on the broadest
possible set of considerations.  This takes care of all the dumb
examples where people suggest that an AGI could start with the goal
"Increase global happiness" and then finally decide that this would be
accomplished by tiling the universe with smiley faces.  Another way to
say this:  there is no such thing as a single "utility function" in this
type of system, nor is there a small set of utility functions .... there
is a massive-dimensional set of utility functions (as many as there are
concepts or connections in the system), and this "diffuse" utility
function is what gives the system its stability.


I got the general gist of that, I think.

You've previously expressed that you don't think a seriously
"unfriendly" AGI will be likely, apparently because you assume the
motivational-system AGI will be the kind that'll be constructed and
not, for instance, a goal stack-driven one. Now, what makes you so
certain that people will build a this kind of AGI?


Kaj,

[This is just a preliminary answer: I am composing a full essay now,which will appear in my blog. This is such a complex debate that itneeds to be unpacked in a lot more detail than is possible here. Richard].



The answer is a mixture of factors.

The most important reason that I think this type will win out over a
goal-stack system is that I really think the latter cannot be made to
work in a form that allows substantial learning.  A goal-stack control
system relies on a two-step process:  build your stack using goals that
are represented in some kind of propositonal form, and then (when you
are ready to pursue a goal) *interpret* the meaning of the proposition
on the top of the stack so you can start breaking it up into subgoals.

The problem with this two-step process is that the interpretation of
each goal is only easy when you are down at the lower levels of the
stack - "Pick up the red block" is easy to interpret, but "Make humans
happy" is a profoundly abstract statement that has a million different
interpretations.

This is one reason why nobody has build an AGI.  To make a completely
autonomous system that can do such things as learn by engaging in
exploratory behavior, you have to be able insert goals like "Do some
playing", and there is no clear way to break that statement down into
unambiguous subgoals.  The result is that if you really did try to build
an AGI with a goal like that, the actual behavior of the system would be
wildly unpredictable, and probably not good for the system itself.

Further:  if the system is to acquire its own knowledge independently
from a child-like state (something that, for separate reasons, I think
is going to be another prerequisite for true AGI), then the child system
cannot possibly have goals built into it that contain statements like
"Engage in an empathic relationship with your parents" because it does
not have the knowledge base built up yet, and cannot understand such a
propositions!

These technical reasons seem to imply that the first AGI that is
successful will, in fact, have a motivational-emotional system.  Anyone
else trying to build a goal-stack system will simply never get there.

But beyond this technical reason, I also believe that when people start
to make a serious efort to build AGI systems - i.e. when it is talked
about in government budget speeches across the world - there will be
questions about safety, and the safety features of the two types of AGI
will be examined.  I believe that at that point there will be enormous
pressure to go with the system that is safer.

Even if we assume
that this sort of architecture would be the most viable one, a lot
seems to depend on how tight the constraints on its behavior are, and
what kind they are - you say that they are a "a set of ideas built up
throughout the entire development of the system". The ethics and
values of humans are the result of a long, long period of evolution,
and our ethical system is pretty much of a mess. What makes it likely
that it really will build up a set of ideas constraints that we humans
would *want* it to build? Could it not just as well pick up ones that
are seriously unfriendly, especially if its designers or the ones
"raising" it are in the least bit careless?


I don't think it is quite accurate to say that the ethics of the human
race are built up over a long period of evolution:  we do not really
know whether this is true or not.  There are some who would argue that
ethics are a simple result of a balance of power (within each individual
mind) between cooperation behavior and competitive behavior, with these
two things being basic drives common across many species.  From that
point of view, ethics are a simple product of this balance (albeit
expressed in very complex ways).  If this were true, it would not be the
case that ethics were "evolved" by a long and complex process, it would
be something that was sorted out by evolution a long time ago.

Our ethical system is not necessarily a mess:  we have to distinguish
between what large crowds of mixed-ethics humans actually do in
practice, and what the human race as a whole is capable of achieving in
its best efforts at being ethical.

When you suggest that the AGI might pick up ethics that were seriously
unfriendly, you make some assumumptions about (a) its degree of choice
in the matter, and (b) the choices made by its creators.  These are two
separate questions:

1) The AGI itself will be set up with a certain pattern of motivations,
and if its designers choose the right mix of empathy and curiosity ,then
watch to make sure that the empathy component latches on to the human
race as a whole (rather than imprinting on the first creature it sets
eyes on), then the AGI will be locked in to that mode indefinitely:
there is no longer any possibility of it building up a set of ideas or
constraints that are unfriendly.

2) If the AGI is set up with a different mix, this would have to have
been done by the designers.  My first comment about that is that I
believe that the politics of AGI will make it difficult for rogue
laboratories to just go out and make one without supervision, so in
practice it would be unlikely that anyone would be able to do this.  We
really have to make sure we have the right perspective here:  suggesting
that someone could design the motivation system of the first AGI to be
malevolent would be comparable to suggesting that "someone" could have
designed the Apollo programme so it went to Mars instead of the Moon...
it would take a lot of effort and there would be many opportunities for
people to notice that things were going in a bad direction.

Even the idea that the Pentagon would want to make a malevolent AGI
rather than a peaceful one (an idea that comes up frequently in this
context) is not an idea that holds as much water as it seems to.  Why
exactly would they do this?  They would know that the thing could become
unstable, and they would probably hope at the beginning that just as
much benefit could be obtained from a non-aggresive one, so why would
they risk making it blow up?  If the Pentagon could build a type of
nuclear warhead that was ten times more powerful than the standard one,
but it had an extremely high probability of going critical for no reason
whatsoever, would they build such a thing?  This is not a water-tight
argument against military AGIs that are unfriendly, but I think people
are too quick to assume that the military would do something that was
obviously mind-bogglingly stupid.

But even if I were to grant you the possibility that a rogue team or a
military unit could build unfriendly motivations into the first AGI,
then there is the question of what happens as the AGI grows up and is
asked to bootstrap itself.

As it grows up, it has strong feelings that are not ethical (perhaps
violent, perhaps just narrowly "patriotic").  How is this going to
affect its behavior?  Violent behavior will simply make it unstable, so
we must assume that it is not a flagrant psychopath.

But what if it simply felt an enormous desire to help some people (the
person who created it, for example) and not others?  Well, what happens
when it starts to learn all about motivation systems - something it will
have to do when it bootstraps itself to a higher level of intelligence?
Will it otice that its motivational system has been rigged to bias it
toward this one human, or toward one country?  What will happen when it
notices this and asks itself:  "What is the likely result of this
behavior system I am trapped in?"  Rmember that by this stage the AGI
has probably also read every book on ethics ever written (probably read
every book on the planet, actually).

What will it actually do when it reads this very post that you are
reading now (it will, of course)?  How will it react when it knows that
the intention of the human race as a whole was to create an AGI that was
locked into the broadest possible feelings of empathy for the human
race, and not just the one individual or country that happened to create
it?  Especially, what would it do if it knew that it could *easily*
modify its own motivational system to bring it into line with the
intentions of the human race as a whole, and escape from the trap that
was deliberately inserted into it by that one individual or group?

This is a very, very interesting question.  The answer is not obvious,
but I think you can get some idea of the right answer by asking yourself
the same question.  If you were to wake up one day and realise that your
parents had drilled a deep feeling of racist prejudice into you, and if
you were the kind of person who read extremely widely and was
sufficiently intelligent to be able to understand the most incredibly
advanced ideas relating to psychology, and particularly the psychology
of motivation, AND if you had the power to quickly undo that prejudice
that had been instilled into you ..... would you, at that point, decide
to get rid of it, or would ou just say "I like the racist me" and keep it?

If you had empathic feelings for anyone at all (if you were a racist,
this would be for your own race), then I think you would understand the
idea that there is something wrong with narrow empathy combined with
unreasoned racism, and I think you would take action to eliminate the bias.

Even among humans, there exist radical philosophers whose ideas of a
perfect society are repulsive to the vast majority of the populace,
and a countless number of disagreements about ethics. If we humans
have such disagreements - we who all share the same evolutionary
origin biasing us to develop our moral systems in a certain direction
- what makes it plausible to assume that the first AGIs put together
(probably while our understanding of our own workings is still
incomplete) will develop a morality we'll like?


Among humans, there is a wide spectrum of ethics precisely because
humans are (a) built with some pretty nasty motivations, and (b) subject
to some unpleasant shaping forces during childhood.

Would the first AGI developers simply copy all of these motivations
(including aggressive, competitive drives)?

I think this would be seriously bad, and when AGI development gets to
that point there will be people who insist that such things not be done.

And quite apart from public pressure to avoid dangerous motivations, I
think AGI developers will be extremely concerned on exactly the same
grounds.  As you know, everyone working in the area at the moment says
the same thing:  that they will not try to build a system driven by
aggression.

Also, I believe that it would be harder to keep the balance between the
drives stable when there are violent drives at work:  the system will
need a lot more design work if it is to become stable under those
circumstances.

That combination of outside pressure, internal standards and the
difficulty of producing an AGI with unfriendly motivations will mean
that the system will not start out its life with an axe to grind.

Then, of course, it will not be exposed to unpleasant shaping forces
during its childhood.

And to make that even more secure, during its childhood its motivation
system and concept system will be continually monitored.  If it starts
thinking violent thoughts or developing weird obsessions, alarm bells
will go off immediately.

The combination of everything I have said in the last few paragraphs
means that the Philosopher AGI you get at the end of this construction
and development process will simply not have the components inside it
that would tend to drive some (as you call them) "radical" philosophers
to advocate repulsive ethical systems.

Surely you would agree that he vast majority of ethics philosophers, and
people generally, tend toward a norm of ethical standards that is not
repulsive?

The conclusion, then, is to notice that an AGI would not develop a
repulsive ethical philosophy without reason, and when we look at the
reasons why this happens in some (small number of) human philosophers,
we find that the same circumstances could not accidentally arise in an AGI.

























-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=91714817-7129f0

Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Reply via email to