Re: [agi] General Friendliness theory

Pei Wang Fri, 10 Jan 2003 09:56:01 -0800

----- Original Message -----
From: "Eliezer S. Yudkowsky" <[EMAIL PROTECTED]>
Sent: Friday, January 10, 2003 11:40 AM
>
> 2)  Let's say there's a remote possibility the human programmers are not
> infallible.  Imagining a given type of possible cognitive or moral errors
> by the programmers and the subsequent perturbations of the AI, what kind
> of architecture would be needed for an AI goal system to conceive of,
> define, notice, and correct that class of mistake?


To make my opinions on this complicated issue brief, they are:

(1) An AGI system, by design, is neutral towards morality, that is, it can
become either nice or evil.

(2) What makes a system nice or evil is mostly its goals.  For an AGI, its
current goals are derived from its initial goals according to the knowledge
of the system.

(3) No matter how the initial goals are selected, for a system working with
insufficient knowledge and resources, there is no way to guarantee that the
derived goals are actually consistent with the initial goals.  Nor can it be
guaranteed that the initial goals always win over derived goals whenever a
conflict happens.

In summary, an AGI always has the danger to become evil. To make an AGI
friendly, what matter are the choice of the initial goals and the control of
the experience of the system (i.e., what knowledge is available to it, what
feedback it gets from its behaviors).  From a theoretical point of view,
these decisions are independent of the major design decisions to make the
system intelligent.

In the previous discussions on this topic, the point (3) are often ignored.
To me, it is the crucial point.

"If AGI can become evil, why we still want to do it" and "What we can do if
an AGI becomes evil" are separate issues that I won't address here.

Pei



-------
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Re: [agi] General Friendliness theory

Reply via email to