Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Kaj Sotala Mon, 05 May 2008 14:33:04 -0700

Richard,

again, I must sincerely apologize for responding to this so
horrendously late. It's a dreadful bad habit of mine: I get an e-mail
(or blog comment, or forum message, or whatever) that requires some
thought before I respond, so I don't answer it right away... and then
something related to my studies or hobbies shows up and doesn't leave
me with enough energy to compose responses to anybody at all, after
which enough time has passed that the message has vanished from my
active memory, and when I remember it so much time has passed already
that a day or two more before I answer won't make any difference...
and then *so* much time has passed that replying to the message so
late feels more embarassing than just quietly forgetting about it.

I'll try to better my ways in the future. On the same token, I must
say I can only admire your ability to compose long, well-written
replies to messages in what seem to be blinks of an eye to me. :-)

On 3/11/08, Richard Loosemore <[EMAIL PROTECTED]> wrote:
> Kaj Sotala wrote:
>
> > On 3/3/08, Richard Loosemore <[EMAIL PROTECTED]> wrote:
> >
> > > Kaj Sotala wrote:
> > >  > Alright. But previously, you said that Omohundro's paper, which to me
> > >  > seemed to be a general analysis of the behavior of *any* minds with
> > >  > (more or less) explict goals, looked like it was based on a
> > >  > 'goal-stack' motivation system. (I believe this has also been the
> > >  > basis of your critique for e.g. some SIAI articles about
> > >  > friendliness.) If built-in goals *can* be constructed into
> > >  > motivational system AGIs, then why do you seem to assume that AGIs
> > >  > with built-in goals are goal-stack ones?
> > >
> > >
> > > I seem to have caused lots of confusion earlier on in the discussion, so
> > >  let me backtrack and try to summarize the structure of my argument.
> > >
> > >  1)  Conventional AI does not have a concept of a
> "Motivational-Emotional
> > >  System" (MES), the way that I use that term, so when I criticised
> > >  Omuhundro's paper for referring only to a "Goal Stack" control system,
> I
> > >  was really saying no more than that he was assuming that the AI was
> > >  driven by the system that all conventional AIs are supposed to have.
> > >  These two ways of controlling an AI are two radically different
> designs.
> > >
> > [...]
> >
> > >  So now:  does that clarify the specific question you asked above?
> > >
> >
> > Yes and no. :-) My main question is with part 1 of your argument - you
> > are saying that Omohundro's paper assumed the AI to have a certain
> > sort of control system. This is the part which confuses me, since I
> > didn't see the paper to make *any* mentions of how the AI should be
> > built. It only assumes that the AI has some sort of goals, and nothing
> > more.
[...]
> > Drive 1: AIs will want to self-improve
> > This one seems fairly straightforward: indeed, for humans
> > self-improvement seems to be an essential part in achieving pretty
> > much *any* goal you are not immeaditly capable of achieving. If you
> > don't know how to do something needed to achieve your goal, you
> > practice, and when you practice, you're improving yourself. Likewise,
> > improving yourself will quickly become a subgoal for *any* major
> > goals.
> >
>
>  But now I ask:  what exactly does this mean?
>
>  In the context of a Goal Stack system, this would be represented by a top
> level goal that was stated in the knowledge representation language of the
> AGI, so it would say "Improve Thyself".
[...]
>  The reason that I say Omuhundro is assuming a Goal Stack system is that I
> believe he would argue that that is what he meant, and that he assumed that
> a GS architecture would allow the AI to exhibit behavior that corresponds to
> what we, as humans, recognize as wanting to self-improve.  I think it is a
> hidden assumption in what he wrote.

At least I didn't read the paper in such a way - after all, the
abstract says that it's supposed to apply equally to all AGI systems,
regardless of the exact design:

"We identify a number of "drives" that will appear in sufficiently
advanced AI systems of any design. We call them drives because they
are tendencies which will be present unless explicitly counteracted."

(You could, of course, suppose that the author was assuming that an
AGI could *only* be built around a Goal Stack system, and therefore
"any design" would mean "any GS design"... but that seems a bit
far-fetched.)

> > Drive 2: AIs will want to be rational
> > This is basically just a special case of drive #1: rational agents
> > accomplish their goals better than irrational ones, and attempts at
> > self-improvement can be outright harmful if you're irrational in the
> > way that you try to improve yourself. If you're trying to modify
> > yourself to better achieve your goals, then you need to make clear to
> > yourself what your goals are. The most effective method for this is to
> > model your goals as a utility function and then modify yourself to
> > better carry out the goals thus specified.
>
>  Well, again, what exactly do you mean by "rational"?  There are many
> meanings of this term, ranging from "generally sensible" to "strictly
> following a mathematical logic".
>
>  Rational agents accomplish their goals better than irrational ones?  Can
> this be proved?  And with what assumptions?  Which goals are better
> accomplished .... is the goal of "being rational" better accomplished by
> "being rational"?  Is the goal of "generating a work of art that has true
> genuineness" something that needs rationality?
>
>  And if a system is trying to modify itself to better achieve its goals,
> what if it decides that just enjoying the subjective experience of life is
> good enough as a goal, and then realizes that it will not get more of that
> by becoming more rational?
>
>  Most of these questions are rhetorical (whoops, too late to say that!), but
> my general point is that the actual behavior that results from a goal like
> "Be rational" depends (again) on the exact interpretation, and in the right
> kind of MES system there is no *absolute* law at work that says that
> everything the creature does must be perfectly or maximally rational.  The
> only time you get that kind of absolute obedience to a principle of
> rationality is in a GS type of AGI.

Despite the fact that they were rhetorical questions, I do feel like
pointing out that Omohundro actually defined rational in his paper.
:-)

"So we'll assume that these systems will try to self-improve. What
kinds of changes will they make to themselves? Because they are goal
directed, they will try to change themselves to better meet their
goals in the future. [...] From its current perspective, it would be a
disaster if a future version of itself made self modifications that
worked against its current goals. So how can it ensure that future
self-modifications will accomplish its current objectives? For one
thing, it has to make those objectives clear to itself. [...] One way
to evaluate an uncertain outcome is to give it a weight equal to its
expected utility (the average of the utility of each possible outcome
weighted by its probability). The remarkable 'expected utility'
theorem of microeconomics says that it is always possible for a system
to represent its preferences by the expectation of a utility function
unless the system has 'vulnerabilities' which cause it to lose
resources without benefit [1]. Economists describe systems that act to
maximize their expected utilities as 'rational economic agents'."

>  So, if Omunhundro meant to include MES-driven AGIs in his assumptions, then
> I see no deductions that can be made from the idea that the AGI will want to
> be more rational, because in an MES-driven AGI the tendency toward
> rationaliity is just a tendency, and it the behavior of the system would
> certanly not be forced toward maximum rationality.

Yes, you're definitely right in that some of the drives that Omohundro
speaks about will be less likely to manifest themselves in MES-driven
AGIs with certain architectures. But I don't think that's an objection
towards the paper per se: just because some of the tendencies are
weaker in some systems doesn't mean they won't appear at all. (Silly
analogue: birds and hot-air balloons are less affected by gravity than
your average T-Rex, but that doesn't mean they're immune.) I find the
paper valuable and insightful simply because it presents tendencies
that manifest themselves in all useful AI systems *at all* - certainly
they're weaker in some systems, but analyzing which drives are the
strongest in which sorts of architectures would be a separate paper
(or more likely, several). It's also useful to be aware of them when
designing AGI architectures - one might want to design their system in
such a way as to minimize or maximize the impact of specific drives.
(People have been talking about Friendliness theory for a long time,
but I'd say this is one of the first papers actually contributing
something practically useful to that field...)

> > Drive 3: AIs will want to preserve their utility functions
> > Since the utility function constructed was a model of the AI's goals,
> > this drive is equivalent to saying "AIs will want to preserve their
> > goals" (or at least the goals that are judged as the most important
> > ones). The reasoning for this should be obvious - if a goal is removed
> > from the AI's motivational system, the AI won't work to achieve the
> > goal anymore, which is bad from the point of view of an AI that
> > currently does want the goal to be achieved.
>
>  This is, I believe, only true of a rigidly deterministic GS system, but I
> can demonstrate easily enough that it is not true of at least on etype of
> MES system.
>
>  Here is the demonstration (I originally made this argument when I first
> arrived on the SL4 list a couple of years ago, and I do wonder if it was one
> of the reasons why some of the people there took an instant dislike to me).
> I, as a human being, and driven by goals which include my sexuality, and
> part of that, for me, is the drive to be heterosexual only.  In real life I
> have no desire to cross party lines:  no judgement implied, it just happens
> to be the way I am wired.
>
>  However, as an AGI researcher, I *know* that I would be able to rewire
> myself at some point in the future so that I would actually break this
> taboo.  Knowing this, would I do it, perhaps as an experiment?  Well, as the
> me of today, I don't want to do that, but I am aware that the me of tomorrow
> (after the rewiring) would be perfectly happy about it. Knowing that my
> drives today contain a zero desire to cross gender lines is one thing, but
> in spite of that I might be happy to switch my wiring so that I *did* enjoy
> it.
>
>  This means that by intellectual force I have been able to at least consider
> the possibilty of changing my drive system to like something, today, I
> absolutely do not want.  I know it would do not harm, so it is open as a
> possibility.

Good example - part of why it took me so long to answer this e-mail
was because I was trying to come up with a counter-example, or an
alternative explanation. The closest that I got was to suggest that,
in GS terms, you only have "be only heterosexual" as a subgoal of some
higher principle, not as a goal as itself. Now obviously the
super/subgoal terminology is based on GS architectures, and as such
you might be right in that this drive only applies for GS systems...
but on the other hand, a MES-based AI would also have things that it
considered more important than others, so I'm not sure if an analogous
reasoning might not apply for them. But I still don't find that answer
of mine fully satisfying.

-- 
http://www.saunalahti.fi/~tspro1/ | http://xuenay.livejournal.com/

Organizations worth your time:
http://www.singinst.org/ | http://www.crnano.org/ | http://lifeboat.com/

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Reply via email to