Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Kaj Sotala Sun, 03 Feb 2008 15:07:19 -0800

On 1/30/08, Richard Loosemore <[EMAIL PROTECTED]> wrote:
> Kaj,
>
> [This is just a preliminary answer:  I am composing a full essay now,
> which will appear in my blog.  This is such a complex debate that it
> needs to be unpacked in a lot more detail than is possible here.  Richard].


Richard,

[Where's your blog? Oh, and this is a very useful discussion, as it's
given me material for a possible essay of my own as well. :-)]

Thanks for the answer. Here's my commentary - I quote and respond to
parts of your message somewhat out of order, since there were some
issues about ethics scattered throughout your mail that I felt were
best answered with a single response.

> The most important reason that I think this type will win out over a
> goal-stack system is that I really think the latter cannot be made to
> work in a form that allows substantial learning.  A goal-stack control
> system relies on a two-step process:  build your stack using goals that
> are represented in some kind of propositonal form, and then (when you
> are ready to pursue a goal) *interpret* the meaning of the proposition
> on the top of the stack so you can start breaking it up into subgoals.
>
> The problem with this two-step process is that the interpretation of
> each goal is only easy when you are down at the lower levels of the
> stack - "Pick up the red block" is easy to interpret, but "Make humans
> happy" is a profoundly abstract statement that has a million different
> interpretations.
>
> This is one reason why nobody has build an AGI.  To make a completely
> autonomous system that can do such things as learn by engaging in
> exploratory behavior, you have to be able insert goals like "Do some
> playing", and there is no clear way to break that statement down into
> unambiguous subgoals.  The result is that if you really did try to build
> an AGI with a goal like that, the actual behavior of the system would be
> wildly unpredictable, and probably not good for the system itself.
>
> Further:  if the system is to acquire its own knowledge independently
> from a child-like state (something that, for separate reasons, I think
> is going to be another prerequisite for true AGI), then the child system
> cannot possibly have goals built into it that contain statements like
> "Engage in an empathic relationship with your parents" because it does
> not have the knowledge base built up yet, and cannot understand such a
> propositions!

I agree that it could very well be impossible to define explict goals
for a "child" AGI, as it doesn't have enough built up knowledge to
understand the propositions involved. I'm not entirely sure of how the
motivation approach avoids this problem, though - you speak of
"setting up" an AGI with motivations resembling the ones we'd call
curiosity or empathy. How are these, then, defined? Wouldn't they run
into the same difficulties?

Humans have lots of desires - call them goals or motivations - that
manifest in differing degrees in different individuals, like wanting
to be respected or wanting to have offspring. Still, excluding the
most basic ones, they're all ones that a newborn child won't
understand or feel before (s)he gets older. You could argue that they
can't be inborn goals since the newborn mind doesn't have the concepts
to represent them and because they manifest variably with different
people (not everyone wants to have children, and there are probably
even people who don't care about the respect of others), but still,
wouldn't this imply that AGIs *can* be created with in-built goals? Or
if such behavior can only be implemented with a motivational-system
AI, how does that avoid the problem of some of the wanted final
motivations being impossible to define in the initial state?

> But beyond this technical reason, I also believe that when people start
> to make a serious efort to build AGI systems - i.e. when it is talked
> about in government budget speeches across the world - there will be
> questions about safety, and the safety features of the two types of AGI
> will be examined.  I believe that at that point there will be enormous
> pressure to go with the system that is safer.

This makes the assumption that the public will become aware of AGI
being near well ahead of the time, and takes the possibility
seriously. If that assumption holds, then I agree with you. Still, the
general public seems to think that AGI will never be created, or at
least not in hundreds of years - and many of them remember the
overoptimistic promises of AI researchers in the past. If a sufficient
amount of scientists thought that AGI was doable, the public might be
convinced - but most scientists want to avoid making radical-sounding
statements, so they won't appear as crackpots to the people reviewing
their research grant applications. Combine this with the fact that the
keys for developing AGI might be scattered across so many disciplines
that very few people have studied them all, or that sudden
breakthroughs may accelerate the research, I don't think it's a given
that the assumption holds (though I certainly won't claim that it's
certain not to hold, either - it very well might).

> Our ethical system is not necessarily a mess:  we have to distinguish
> between what large crowds of mixed-ethics humans actually do in
> practice, and what the human race as a whole is capable of achieving in
> its best efforts at being ethical.
[...]
> Even the idea that the Pentagon would want to make a malevolent AGI
> rather than a peaceful one (an idea that comes up frequently in this
> context) is not an idea that holds as much water as it seems to.  Why
> exactly would they do this?  They would know that the thing could become
> unstable, and they would probably hope at the beginning that just as
> much benefit could be obtained from a non-aggresive one, so why would
> they risk making it blow up?  If the Pentagon could build a type of
> nuclear warhead that was ten times more powerful than the standard one,
> but it had an extremely high probability of going critical for no reason
> whatsoever, would they build such a thing?  This is not a water-tight
> argument against military AGIs that are unfriendly, but I think people
> are too quick to assume that the military would do something that was
> obviously mind-bogglingly stupid.
[...]
> Among humans, there is a wide spectrum of ethics precisely because
> humans are (a) built with some pretty nasty motivations, and (b) subject
> to some unpleasant shaping forces during childhood.
>
> Would the first AGI developers simply copy all of these motivations
> (including aggressive, competitive drives)?
>
> I think this would be seriously bad, and when AGI development gets to
> that point there will be people who insist that such things not be done.
>
> And quite apart from public pressure to avoid dangerous motivations, I
> think AGI developers will be extremely concerned on exactly the same
> grounds.  As you know, everyone working in the area at the moment says
> the same thing:  that they will not try to build a system driven by
> aggression.
>
> Also, I believe that it would be harder to keep the balance between the
> drives stable when there are violent drives at work:  the system will
> need a lot more design work if it is to become stable under those
> circumstances.
>
> That combination of outside pressure, internal standards and the
> difficulty of producing an AGI with unfriendly motivations will mean
> that the system will not start out its life with an axe to grind.
>
> Then, of course, it will not be exposed to unpleasant shaping forces
> during its childhood.

Our ethical system is a mess in the sense that we have lots of moral
intuitions that are logically contradictory if you look at them close
enough, or which don't match reality.
http://en.wikipedia.org/wiki/Mere_addition_paradox is a cute example
of the most conventional kind, as is abortion (both the pro-choice and
pro-life stances lead to absurdities if taken to their logical
extremes - either banning all forms of contraception and requiring
constant mating, or saying that it's okay to kill people as long as
nobody else is harmed).

Of course, human ethics doesn't (necessarily) care about moral
principles leading to absurdities when extended an absurd amount - we
have a certain area where the principle applies, then a grey area
where it may or may not apply, and then an area where it certainly
doesn't apply. But those are always more or less arbitrary lines,
shaped more by cultural factors and personality traits than logical
considerations. The fact that some philosophers have visions of
utopias that are repugnant to the general public isn't necessarily
because they'd have had traumatic experiences, it's simply because
they have chosen to draw those arbitrary borders at points which
others consider extreme. They might very well be empathic, loving and
caring - it's just that they have a radically different vision of
what's good for people than others do.

An example that I feel is particularly relevant for AGI is the
question of when to go against the desires of a person. It's
considered okay that children are sent to school or made to eat
healthy foods even against their will, if they're still too young to
know their own best. We also have other cases where people are more or
less denied their normal autonomy - when somebody has a serious mental
illness, when the state bans certain products from being sold, or
taxes them more heavily to discourage them from being bought. The
assumption is that there's a certain level above which people are
capable of taking care of themselves, but that level is relative to
the population average. An empathic superintelligent being might very
well come to view us as an empathic parent views her children:
individuals whose desires should be fulfilled when possible, but whose
desires can also be ignored if that isn't good for them in the long
term.

(Even if that wasn't explictly the case, the difference between
"persuasion" and "coercion" is usually framed as persuasion being the
method which still lets the other person to choose freely. But then, a
sufficiently superintelligent being might very well be able to
persuade anyone to anything, so in practice the difference seems
moot.)

Were the AGI to have a conception of "good for us" that we found
acceptable, then this wouldn't be a problem. But lots of people seem
to think (if only implictly) that "what's good for X" boils down to
"what makes X the happiest in the long run". This would imply that
what's best for us is to make us maximally happy, which in turn leads
to the wirehead scenario of a civilization of beings turned into
things of pure, mindless bliss. While we might find the experience of
wire-heading wonderful and never want to stop (if we could be said to
have any opinions anymore, at that point), lots of people would find
the thought of being reduced to beings of nothing else than that, with
nothing left of human culture, repugnant. But then, simply because
things are repugnant doesn't mean that they'd be wrong - same-sex or
interracial relationships are still considered repugnant in large
parts of the world - and if the AGI thought that it could make us
happier by removing that repugnance... well, there's no logical reason
for why not. Maybe this *would* be for the best of humanity. But I
sure don't want that outcome.

And that's hardly the only example that's worrying - there are bound
to be lots of other similar dilemmas, with no clear-cut answers.
Ethics is built on arbitrary axioms and often conflicting preferences,
and any mind would need to be severely fine-tuned in order for it to
build up an ethical system that we'd like - but we don't know enough
about what we want to make the fine-tuning. It might make us like any
ethical system it deemed good, so maybe this doesn't matter and we'll
in any case end up with an ethical system we'll like - or maybe
somehow we end up in a world that we *don't* like, if the AGI happened
to choose some other criteria than happiness as the most important one
for defining what's good for us...

*This* is the reason why I consider AGI development worrying - not
because because somebody might accidentially program an AGI with
hostile motivations (or goals, or whatever), but because even
well-intending people might create an AGI that really was empathic to
humans - but if they didn't realize how complex human ethics really
was, just the fact that they built an empathic AGI might not be
enough. That's also why I consider Eliezer's Coherent Extrapolated
Volition proposal as the best suggestion for AGI morality so far, as
it seems to avoid many of these pitfalls.

> But what if it simply felt an enormous desire to help some people (the
> person who created it, for example) and not others?  Well, what happens
> when it starts to learn all about motivation systems - something it will
> have to do when it bootstraps itself to a higher level of intelligence?
> Will it otice that its motivational system has been rigged to bias it
> toward this one human, or toward one country?  What will happen when it
> notices this and asks itself:  "What is the likely result of this
> behavior system I am trapped in?"  Rmember that by this stage the AGI
> has probably also read every book on ethics ever written (probably read
> every book on the planet, actually).
>
> What will it actually do when it reads this very post that you are
> reading now (it will, of course)?  How will it react when it knows that
> the intention of the human race as a whole was to create an AGI that was
> locked into the broadest possible feelings of empathy for the human
> race, and not just the one individual or country that happened to create
> it?  Especially, what would it do if it knew that it could *easily*
> modify its own motivational system to bring it into line with the
> intentions of the human race as a whole, and escape from the trap that
> was deliberately inserted into it by that one individual or group?
>
> This is a very, very interesting question.  The answer is not obvious,
> but I think you can get some idea of the right answer by asking yourself
> the same question.  If you were to wake up one day and realise that your
> parents had drilled a deep feeling of racist prejudice into you, and if
> you were the kind of person who read extremely widely and was
> sufficiently intelligent to be able to understand the most incredibly
> advanced ideas relating to psychology, and particularly the psychology
> of motivation, AND if you had the power to quickly undo that prejudice
> that had been instilled into you ..... would you, at that point, decide
> to get rid of it, or would ou just say "I like the racist me" and keep it?
>
> If you had empathic feelings for anyone at all (if you were a racist,
> this would be for your own race), then I think you would understand the
> idea that there is something wrong with narrow empathy combined with
> unreasoned racism, and I think you would take action to eliminate the bias.

(As noted above, I don't consider this the /most/ likely scenario, but
neither do I consider it exceedingly unlikely.)

I'll pose a counter-example: if you were to wake up one night and
realize that evolution had crafted you with a deep feeling of
preferring the well-being of your family and children above that of
others, and you were superintelligent and knew everything about
psychology and could easily remove that unfair preference from your
mind... would you say "it's unfair that these people, chosen
effectively at random, receive more benefits from me than anybody
else, so I shall correct the matter at once", or say "I love my family
and children, and would never do anything that might cause me to treat
them worse"?

Again, ethics is axiomatic. Just because you acknowledge that others
might suffer too, doesn't /necessarily/ mean that you'd give their
suffering any weight. Also, there are plenty of people who do things
which they acknowledge are wrong - but keep doing anyway, and wouldn't
change themselves if they could. And they don't need to be complete
sociopaths who have no empathic feelings towards anyone at all.

To me, this sounds like a variation of the old "isn't it impossible to
build AGIs to be friendly, since they could always remove any unwanted
tampering from themselves" argument - it assumes desires and emotional
structures formed by evolution which don't need to be present in a
custom-built AGI. The traditional argument assumes that every mind
must want to be totally free from outside influence, simply because
the influence exists. The version you've posed essentially assumes
that every mind that cares about a group of people has a tendency to
start caring about other groups of people, simply because the other
groups exist.




-- 
http://www.saunalahti.fi/~tspro1/ | http://xuenay.livejournal.com/

Organizations worth your time:
http://www.singinst.org/ | http://www.crnano.org/ | http://lifeboat.com/

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=93262350-415ef5

Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Reply via email to