Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Mark Waser Tue, 11 Mar 2008 14:51:28 -0700

>> Drive 1: AIs will want to self-improve
>> This one seems fairly straightforward: indeed, for humans
>> self-improvement seems to be an essential part in achieving pretty
>> much *any* goal you are not immeaditly capable of achieving. If you
>> don't know how to do something needed to achieve your goal, you
>> practice, and when you practice, you're improving yourself. Likewise,
>> improving yourself will quickly become a subgoal for *any* major
>> goals.
> 
> But now I ask:  what exactly does this mean?


It means that they will want to improve their ability to achieve their goals 
(i.e. in an MES system, optimize their actions/reactions to more closely 
correspond to what is indicated/appropriate for their urges and constraints).

> In the context of a Goal Stack system, this would be represented by a 
> top level goal that was stated in the knowledge representation language 
> of the AGI, so it would say "Improve Thyself".

One of the shortcomings of your current specification of the MES system is that 
it does not, at the simplest levels, provide a mechanism for globally 
optimizing (increasing the efficiency of) the system.  This makes it safer 
because such a mechanism *would* conceivably be a single point of failure for 
Friendliness but evolution will "favor" the addition of any such a system -- as 
would any humans that would like a system to improve itself.  I don't currently 
see how an MES system could be a "seed AGI" unless such a system is added.  

> My point here is that a Goal Stack system would *interpret* this goal in 
> any one of an infinite number of ways, because the goal was represented 
> as an explicit statement.  The fact that it was represented explicitly 
> meant that an extremely vague concept ("Improve Thyself") had to be 
> encoded in such a way as to leave it open to ambiguity.  As a result, 
> what the AGI actually does as a result of this goal, which is embedded 
> in a Goal Stack architecture, is completely indeterminate.

Oh.  I disagree *entirely*.  It is only indeterminate because you gave it an 
indeterminate goal with *no* evaluation criteria.  Now, I *assume* that you 
ACTUALLY mean "Improve Thyself So That You Are More Capable Of Achieving An 
Arbitrary Set Of Goals To Be Specified Later" and I would argue that the most 
effective way for the system to do so is to increase it's intelligence (the 
single-player version of goal-achieving ability) and friendliness (the 
multi-player version of intelligence).

> Stepping back from the detail, we can notice that *any* vaguely worded 
> goal is going to have the same problem in a GS architecture.  

But I've given a more explicitly worded goal that *should* (I believe) drive a 
system to intelligence.  The long version of "Improve Thyself" is the necessary 
motivating force for a seed AI.  Do you have a way to add it to an MES system?  
If you can't, then I would have to argue that an MES system will never achieve 
intelligence (though I'm very hopeful that either we can add it to the MES *or* 
there is some form of hybrid system that has the advantages of both and 
disadvantages of neither).

> So long as the goals that are fed into a GS architecture are very, very 
> local and specific (like "Put the red pyramid on top of the green 
> block") I can believe that the GS drive system does actually work (kind 
> of).  But no one has ever built an AGI that way.  Never.  Everyone 
> assumes that a GS will scale up to a vague goal like "Improve Thyself", 
> and yet no one has tried this in practice.  Not on a system that is 
> supposed to be capable of a broad-based, autonomous, *general* intelligence.

Well, actually I'm claiming that *any* optimizing system with the long version 
of "Improve Thyself" that is sufficiently capable is a "seed AI".  The problem 
is that "sufficiently capable" seems to be a relatively high bar -- 
particularly when we, as humans, don't even know which way is up.  My 
Friendliness theory is (at least) an attempt to identify "up".

> So when you paraphrase Omuhundro as saying that "AIs will want to 
> self-improve", the meaning of that statement is impossible to judge.

As evidenced by my last several e-mails, the best paraphrase of Omohundro is 
"Goal-achievement optimizing AIs will want to self-improve so that they are 
more capable of achieving goals" which is basically a definition or a tautology.

> The reason that I say Omuhundro is assuming a Goal Stack system is that 
> I believe he would argue that that is what he meant, and that he assumed 
> that a GS architecture would allow the AI to exhibit behavior that 
> corresponds to what we, as humans, recognize as wanting to self-improve. 
>  I think it is a hidden assumption in what he wrote.

Optimizing *is* a hidden assumption in what he wrote which you caused me to 
catch later and add to my base assumption.  I don't believe that optimizing 
necessarily assumes a Goal Stack system but it *DOES* assume a self-reflecting 
system which the MES system does not appear to be (yet) at the lowest levels.  
In order to optimize, an MES is going to have to be able to discern the utility 
function that it's urges and constraints form.  If it can't do so, it is not 
going to be able to optimize and it isn't going to be a "seed AI".

> But then, in a MES drive system, the "goal" of self-improvement is not 
> an absolute, so the AGI does not get stuck in the kind of crazy chains 
> of logic I described above.  Self-improvment can be just a tendency. 
> Indeed, it can be just the same as the behavior exhibited by people, who 
> generally tend to self-improve, but without being obsessed by it (usually).

I don't believe that any of the Omohundro drives are ever absolute; however, 
self-improvement is a tendency that tends to self-reinforce as it leads to 
greater successes.  It is a winning strategy any time that it doesn't directly 
conflict with a goal and it is worth devoting resources to since it is a 
generic subgoal of the vast majority of goals.

> But in that case, what can be deduced about this drive?  Is it an 
> automatic feature of an AGI?  Well, if we build it into the AGI it is 
> (in other words, no it is not!).  Are we obliged to put it in, if we 
> want the AGI to function well?  Well, kind of, yes.

Absolutely.  That's why I had to add optimizing to my assumption.

>> Drive 2: AIs will want to be rational
> Well, again, what exactly do you mean by "rational"?  There are many 
> meanings of this term, ranging from "generally sensible" to "strictly 
> following a mathematical logic".

Generally sensible.

> Rational agents accomplish their goals better than irrational ones?  

Yes.

> Can this be proved?  

Yes.

> And with what assumptions?  

None.    :-)  Just take rationality as defined as only taking those actions 
that promote your goals.

> Which goals are better 
> accomplished .... is the goal of "being rational" better accomplished by 
> "being rational"?  Is the goal of "generating a work of art that has 
> true genuineness" something that needs rationality?

Yes.

> And if a system is trying to modify itself to better achieve its goals, 
> what if it decides that just enjoying the subjective experience of life 
> is good enough as a goal, and then realizes that it will not get more of 
> that by becoming more rational?

I've explicitly said that enjoying the subjective experience of life is a 
subgoal, not a primary goal.  There are many, many ways to do it too -- so 
while it is a valid pursuit, it shouldn't interfere with others.  The problem 
with rationality and pleasure is that rationality works on goals and pleasure 
is only a subgoal.  More rationality is likely to uncover previously hidden 
goals which, when unfulfilled, lead to discomfort or reduced pleasure.  If an 
entity wants to stop it's development of rationality and the entity is not a 
burden on society (i.e. interfering in the goals of others) it should be 
allowed to do so even if it's (obviously) not *preferred* by society.

> Most of these questions are rhetorical (whoops, too late to say that!), 

I don't believe that any of them are rhetorical.  A good theory should cover 
them all.

> but my general point is that the actual behavior that results from a 
> goal like "Be rational" depends (again) on the exact interpretation, and 
> in the right kind of MES system there is no *absolute* law at work that 
> says that everything the creature does must be perfectly or maximally 
> rational.  The only time you get that kind of absolute obedience to a 
> principle of rationality is in a GS type of AGI.

:-)  And my general point is that I (and/or the Omohundro drives) don't want or 
need an *absolute* law (though some of his phrasing does mistakenly seem to 
imply that).

> So, if Omunhundro meant to include MES-driven AGIs in his assumptions, 
> then I see no deductions that can be made from the idea that the AGI 
> will want to be more rational, because in an MES-driven AGI the tendency 
> toward rationality is just a tendency, and it the behavior of the 
> system would certainly not be forced toward maximum rationality.

I mean to include optimizing MES-driven AGIs in his/my assumptions (since they 
are the only worthwhile kind -- judging by the goal of seedhood).  The tendency 
towards rational is proportional to the degree of optimization and the lack of 
conflict with current perceived goals (which means that a perceived goal of 
pleasure can stop the tendency towards rationality).  Increasing rationality 
will also frequently take a back-seat to more immediate goals.

> The only way that anyone can conclude that a "Be rational" goal would 
> have definite effects is if they believe that this exists in the context 
> of a Goal-Stack AGI, and even there (as I argued above), I think it is a 
> "fantasy" AGI that they are thinking of, because I believe that in 
> practice the insertion of a "Be rational" drive into a GS AGI would 
> actually not cause that AGI to exhibit what we recognize as rational 
> behavior, but would actually lead to spontaneous outbursts of random 
> behavior, because of the need to interpret "Be Rational".

Could you please rework the above paragraph in light of my above comments?  I 
believe that there WILL be a diffuse "be rational" goal in any optimizing MES 
system that the optimization will pick out and attempt to sharpen.  I believe 
that your argument is based upon the assumption that the optimizing AGI is not 
going to be sufficiently grounded to be able to interpret and operate on the 
concept "Be Rational" and you feel that an MES system *IS* going to be able to 
be sufficiently grounded to self-improve -- without recognizing that 
rationality is inherently implied or required for the self-improvement of goal 
achieving ability.

>> Drive 3: AIs will want to preserve their utility functions
> This is, I believe, only true of a rigidly deterministic GS system, but 
> I can demonstrate easily enough that it is not true of at least on etype 
> of MES system.

Try the following argument.  For every MES system there exists a utility 
function that is exactly equivalent to the sum (with weights) of all of it's 
urges and constraints.  For a system to *BE* an optimizing system, it has to 
optimize something and what it should be optimizing is it's ability to respond 
to it's urges and constraints (i.e. the utility function) which I would prefer 
to call it's goals.

> Here is the demonstration (I originally made this argument when I first 
> arrived on the SL4 list a couple of years ago, and I do wonder if it was 
> one of the reasons why some of the people there took an instant dislike 
> to me).  I, as a human being, and driven by goals which include my 
> sexuality, and part of that, for me, is the drive to be heterosexual 
> only.  In real life I have no desire to cross party lines:  no judgment 
> implied, it just happens to be the way I am wired.
> 
> However, as an AGI researcher, I *know* that I would be able to rewire 
> myself at some point in the future so that I would actually break this 
> taboo.  Knowing this, would I do it, perhaps as an experiment?  Well, as 
> the me of today, I don't want to do that, but I am aware that the me of 
> tomorrow (after the rewiring) would be perfectly happy about it. 
> Knowing that my drives today contain a zero desire to cross gender lines 
> is one thing, but in spite of that I might be happy to switch my wiring 
> so that I *did* enjoy it.
> 
> This means that by intellectual force I have been able to at least 
> consider the possibilty of changing my drive system to like something, 
> today, I absolutely do not want.  I know it would do not harm, so it is 
> open as a possibility.
> 
> Now, that is the kind of thing that is possible in a system drive by an 
> MES drive mechanism.
> 
> It is almost certainly not possible in GS system.  That makes one big 
> difference between the two and undermines the idea that Omuhundro's 
> suggestions were neutral with respect to drive mechanism assumptions.

I'm afraid that I don't understand why it wouldn't be possible in a GS system 
unless you are insisting on a GS system that is so simple that it cannot 
contain and/or adequately handle multiple competing goals.  I would say that 
your situation is analogous to the following.  First off, it isn't even clear 
that you have a drive to be heterosexual *ONLY*.  You clearly have a drive to 
be heterosexual but you don't clearly have a drive *not* to be homosexual.  So 
let's start off by modifying the test case so that, for some reason buried in 
your past, the test you is homophobic and you *do* have a desperate goal not to 
be homosexual because you *are* wired NOT to be homosexual.  In that case, you 
would not like the fact that the you of tomorrow would be perfectly happy about 
it because changing your wiring is interfering with your goal structure.  Now, 
suppose that you have a higher goal of being happy and you are suddenly 
transported to a society where being bisexual is ENFORCED.  Any decent logic 
based system will resolve the equation A implies B, B implies not C, and C (has 
the highest priority) as not A.  So, if A is "wired to be not homosexual", B is 
"not homosexual", and C is "happy" THEN the system immediately points out that 
the solution is not "wired to be not homosexual".

Another way to reframe this would be to claim that you have a minor goal to be 
heterosexual because you currently are wired so that you are happiest that way. 
 Note that this goal goes away if you are rewired to be happy regardless of 
your orientation and further, that this goal is probably reversed if you are 
rewired to prefer being homosexual.  If you aren't tremendously attached to 
your goal of being heterosexual (which you clearly are not), it can easily be 
replaced by another stronger goal  Now, you also have fairly strong general 
goals to explore, experience, and learn (since they are sub-goals of 
self-improve which you have because you're an optimizing system :-).  These 
goals (which aggregate as curiosity) can easily override the not homosexual 
goal if it is not a strong goal.

Or, another way to state my confusion is . . . . It is my contention that a GS 
system with multiple WEIGHTED goal stacks can emulate any MES system with the 
same (or, obviously, fewer) urges and constraints.  If this is the case (isn't 
it?), what can't a GS system do whatever an MES system can do?

> I will have to stop here because I have run out of time, but does this 
> convey the nature of my concern?  I think that at every stage, when you 
> try to pin down exactly what is meant by these statements about goals 
> and drives, the vagueness forces you into a position where differences 
> between background assumptions are overwhelming.

I'm still trying to clarify your concern.  I hope I'm making progress on 
pinning down exactly what *I* believe is meant by these statements about goals 
and drives so that the background assumptions are all explicit.

> Richard Loosemore

    Mark    :-)

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Reply via email to