--- rg <[EMAIL PROTECTED]> wrote:
Matt: Why will an AGI be friendly ?

The question only makes sense if you can define friendliness, which we can't.

Why Matt, thank you for such a wonderful opening . . . .  :-)

Friendliness *CAN* be defined. Furthermore, it is my contention that Friendliness can be implemented reasonably easily ASSUMING an AGI platform (i.e. it is just as easy to implement a Friendly AGI as it is to implement an Unfriendly AGI).

I have a formal paper that I'm just finishing that presents my definition of Friendliness and attempts to prove the above contention (and several others) but would like to to do a preliminary acid test by presenting the core ideas via several e-mails that I'll be posting over the next few days (i.e. y'all are my lucky guinea pig initial audience :-). Assuming that the ideas survive the acid test, I'll post the (probably heavily revised :-) formal paper a couple of days later.

= = = = = = = = = =
PART 1.

The obvious initial starting point is to explicitly recognize that the point of Friendliness is that we wish to prevent the extinction of the *human race* and/or to prevent many other horrible nasty things that would make *us* unhappy. After all, this is why we believe Friendliness is so important. Unfortunately, the problem with this starting point is that it biases the search for Friendliness in a direction towards a specific type of Unfriendliness. In particular, in a later e-mail, I will show that several prominent features of Eliezer Yudkowski's vision of Friendliness are actually distinctly Unfriendly and will directly lead to a system/situation that is less safe for humans.

One of the critically important advantages of my proposed definition/vision of Friendliness is that it is an attractor in state space. If a system finds itself outside (but necessarily somewhat/reasonably close) to an optimally Friendly state -- it will actually DESIRE to reach or return to that state (and yes, I *know* that I'm going to have to prove that contention). While Eli's vision of Friendliness is certainly stable (i.e. the system won't intentionally become unfriendly), there is no "force" or desire helping it to return to Friendliness if it deviates somehow due to an error or outside influence. I believe that this is a *serious* shortcoming in his vision of the extrapolation of the collective volition (and yes, this does mean that I believe both that Friendliness is CEV and that I, personally, (and shortly, we collectively) can define a stable path to an attractor CEV that is provably sufficient and arguably optimal and which should hold up under all future evolution.

TAKE-AWAY:  Friendliness is (and needs to be) an attractor CEV

PART 2 will describe how to create an attractor CEV and make it more obvious why you want such a thing.


!! Let the flames begin !! :-)

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to