On 03/07/2008 08:09 AM,, Mark Waser wrote:
There is one unique attractor in state space.

No. I am not claiming that there is one unique attractor. I am merely saying that there is one describable, reachable, stable attractor that has the characteristics that we want. There are *clearly* other attractors. For starters, my attractor requires sufficient intelligence to recognize it's benefits. There is certainly another very powerful attractor for simpler, brute force approaches (which frequently have long-term disastrous consequences that aren't seen or are ignored).


Of course. An earlier version said "there is one unique attractor that <identify friendliness here>", and while editing it somehow ended up in that obviously wrong form.

Since any sufficiently advanced species will eventually be drawn towards F, the CEV of all species is F.

While I believe this to be true, I am not convinced that it is necessary for my argument. I think that it would make my argument a lot easier if I could prove it to be true -- but I currently don't see a way to do that. Anyone want to chime in here?

Ah, okay. I thought you were going to argue this following on from Omohundro's paper about drives common to all sufficiently advanced AIs and extend it to all sufficiently advanced intelligences, but that's my hallucination.


Therefore F is not species-specific, and has nothing to do with any particular species or the characteristics of the first species that develops an AGI (AI).

I believe that the F that I am proposing is not species-specific. My problem is that there may be another attractor F' existing somewhere far off in state space that some other species might start out close enough to that it would be pulled into that attractor instead. In that case, there would be the question as to how the species in the two different attractors interact. My belief is that it would be to the mutual benefit of both but I am not able to prove that at this time.


For there to be another attractor F', it would of necessity have to be an attractor that is not desirable to us, since you said there is only one stable attractor for us that has the desired characteristics. I don't see how beings subject to these two different attractors would find mutual benefit in general, since if they did, then F' would have the desirable characteristics that we wish a stable attractor to have, but it doesn't.

This means that genuine conflict between friendly species or between friendly individuals is not even possible, so there is no question of an AI needing to arbitrate between the conflicting interests of two friendly individuals or groups of individuals. Of course, there will still be conflicts between non-friendlies, and the AI may arbitrate and/or intervene.

Wherever/whenever there is a shortage of resources (i.e. not all goals can be satisfied), goals will conflict. Friendliness describes the behavior that should result when such conflicts arise. Friendly entities should not need arbitration or intervention but should welcome help in determining the optimal solution (which is *close* to arbitration but subtly different in that it is not adverserial). I would rephrase your general point as a true, adverserial relationship is not even possible.

That's a better way of putting it. Conflict will be possible, but they'll always be resolved via exchange of information rather than bullets.

The AI will not be empathetic towards homo sapiens sapiens in particular. It will be empathetic towards f-beings (friendly beings in the technical sense), whether they exist or not (since the AI might be the only being anywhere near the attractor).

Yes. It will also be empathic towards beings with the potential to become f-beings because f-beings are a tremendous resource/benefit.

You've said elsewhere that the constraints on how it deals with non-friendlies are rather minimal, so while it might be empathic/empathetc, it will still have no qualms about kicking ass and inflicting pain where necessary.


This means no specific acts of the AI towards any species or individuals are ruled out, since it might be part of their CEV (which is the CEV of all beings), even though they are not smart enough to realize it.

Absolutely correct and dead wrong at the same time. You could invent specific incredibly low-probabaility but possible circumstances where *any* specific act is justified. I'm afraid that my vision of Friendliness certainly does permit the intentional destruction of the human race if that is the *only* way to preserve a hundred more intelligent, more advanced, more populous races. On the other hand, given the circumstance space that we are likely to occupy with a huge certainty, the intentional destruction of the human race is most certainly ruled out. Or, in other words, there are no infinite guarantees but we can reduce the dangers to infinitessimally small levels.

I think you're fudging a bit here. If we are only likely to occupy the circumstance space with probability less than 1, then the intentional destruction of the human race is not 'most certainly ruled out': it is with very high probability less than 1 ruled out. I'm not trying to say it's likely; only that's it's possible. I make this point to distinguish your approach from other approaches that purport to make absolute guarantees about certain things (as in some ethical systems where certain things are *always* wrong, regardless of context or circumstance).


Since the AI empathizes not with humanity but with f-beings in general, it is possible (likely) that some of humanity's most fundamental beliefs may be wrong from the perspective of an f-being.

Absolutely. Jihad is fundamentally wrong from the perspective of an f-being. A jihadist is *not* an f-being. It's actions are entirely contrary to the tenets of Friendly action.

And we are not yet f-beings in general, since our current location in state space is so far from F. Or do you believe that some (many?) of us are close to F?

Without getting into the debate of the merits of virtual-space versus meat-space and uploading, etc., it seems to follow that *if* the view that everything of importance is preserved (no arguments about this, it is an assumption for the sake of this point only) in virtual-space and *if* turning the Earth into computronium and uploading humanity and all of Earth's beings would be vastly more efficient a use of the planet, *then* the AI should do this (perhaps would be morally obligated to do this) -- even if every human being pleads for this not to occur. The AI would have judged that if we were only smarter, faster, more the kind of people we would like to be, etc., we would actually prefer the computronium scenario.

The weak point of this argument lies in the phrase "the AI would have judged that if <any clause>, we would actually prefer <any clause>". Extrapolation is a tremendously error-prone process and what the AI is attempting to do here *absolutely requires* that it has a better knowledge of YOUR goals than you do for this to be a Friendly act. We justifiably do this all the time when we do unpleasant things for our child's health. But, the intelligent parent (or Friendly entity) does not do such things without a really high probability that they are correct.

Note: I realize that this is going to be a point of much unhappiness/contention/debate and there will be endless arguments as to exactly where the line is. This is all well and good but I hope that we don't lose the forest for the trees (this is why I'm not doing math at this point). This specific case ends up with an inflammatory conclusion because it starts out by ASSUMING an equally inflammatory premise (i.e. that all human beings are incorrect about their goals). I would argue that this is simply a case of garbage in, garbage out.

I don't think it's inflammatory or a case of garbage in to contemplate that all of humanity could be wrong. For much of our history, there have been things that *every single human was wrong about*. This is merely the assertion that we can't make guarantees about what vastly superior f-beings will find to be the case. We may one day outgrow our attachment to meatspace, and we may be wrong in our belief that everything essential can be preserved in meatspace, but we might not be at that point yet when the AI has to make the decision.

It's become apparent to me in thinking about this that 'friendliness' is really not a good term for the attitude of an f-being that we are talking about: that of acting solely in the interest of f-beings (whether others exist or not) and in consistency with the CEV of all sufficiently ... beings. It is really just acting rationally (according to a system that we do not understand and may vehemently disagree with).

Actually, I would argue that Friendliness is a good term because that is the net result to us if we are Friendly; however, a possibly better term is simply "enlightened self-interest" since that describes why an f-being would want to act that way (i.e. why Friendliness is an attractor).

Yes, when you talk about Friendliness as that distant attractor, it starts to sound an awful lot like "enlightenment", where self-interest is one aspect of that enlightenment, and friendly behavior is another aspect.


:-) I haven't addressed this question yet but the short answer is that there is no requirement for intervention (for a variety of reasons that I haven't established on this forum the necessary groundwork to easily explain).


Looking forward to it. Thanks for the detailed response.

joseph

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to