----- Original Message ----- From: "Eliezer S. Yudkowsky" <[EMAIL PROTECTED]> Sent: Friday, January 10, 2003 11:40 AM > > 2) Let's say there's a remote possibility the human programmers are not > infallible. Imagining a given type of possible cognitive or moral errors > by the programmers and the subsequent perturbations of the AI, what kind > of architecture would be needed for an AI goal system to conceive of, > define, notice, and correct that class of mistake?
To make my opinions on this complicated issue brief, they are: (1) An AGI system, by design, is neutral towards morality, that is, it can become either nice or evil. (2) What makes a system nice or evil is mostly its goals. For an AGI, its current goals are derived from its initial goals according to the knowledge of the system. (3) No matter how the initial goals are selected, for a system working with insufficient knowledge and resources, there is no way to guarantee that the derived goals are actually consistent with the initial goals. Nor can it be guaranteed that the initial goals always win over derived goals whenever a conflict happens. In summary, an AGI always has the danger to become evil. To make an AGI friendly, what matter are the choice of the initial goals and the control of the experience of the system (i.e., what knowledge is available to it, what feedback it gets from its behaviors). From a theoretical point of view, these decisions are independent of the major design decisions to make the system intelligent. In the previous discussions on this topic, the point (3) are often ignored. To me, it is the crucial point. "If AGI can become evil, why we still want to do it" and "What we can do if an AGI becomes evil" are separate issues that I won't address here. Pei ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]