Matt Mahoney wrote:
--- Richard Loosemore <[EMAIL PROTECTED]> wrote:

Derek Zahn wrote:
Richard Loosemore writes:

 > It is much less opaque.
 >
 > I have argued that this is the ONLY way that I know of to ensure that
 > AGI is done in a way that allows safety/friendliness to be guaranteed.
 >
 > I will have more to say about that tomorrow, when I hope to make an
 > announcement.

Cool. I'm sure I'm not the only one eager to see how you can guarantee (read: prove) such specific detailed things about the behaviors of a complex system.
Hmmm... do I detect some skepticism?  ;-)

I remain skeptical.  Your argument applies to an AGI not modifying its own
motivational system.  It does not apply to an AGI making modified copies of
itself.  In fact you say:

Not correct, I am afraid: I specifically emphasize that the AGI is allowed to modify its own motivational system. I don't know how you got the opposite idea. (I haven't had time to review my text, so apologies if it was my fault and I did accidentally give the wrong impression .... but the whole point of this essay was to suggest a way to gurantee friendliness under any circumstances, including self-improvement).

Also, during the development of the first true AI, we would monitor the connections going from motivational system to thinking system. It would be easy to set up alarm bells if certain kinds of thoughts started to take hold -- just do it by associating with certain keys sets of concepts and keywords. While we are designing a stable motivational system, we can watch exactly what goes on, and keep tweeking until it gets to a point where it is clearly not going to get out of the large potential well.

I do not see how this illustrates your point above.


You refer to the humans building the first AGI.  Humans, being imperfect,
might not get the algorithm for friendliness exactly right in the first
iteration.  So it will be up to the AGI to tweak the second copy a little more
(according to the first AGI's interpretation of friendliness).  And so on.  So
the goal drifts a little with each iteration.  And we have no control over
which way it drifts.

What an extraordinary statement to make!

The purpose of the essay was to argue that with each iteration it digs itself deeper into the same pattern and cannot drift out into an unfriendly state.

But you reply to this by just stating that the opposite is going to be the case, without saying why. Which part of my argument did you decide was wrong, that you could state the opposite conclusion?



Richard Loosemore




-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48412669-8b7478

Reply via email to