Ben Goertzel wrote:

Hi

All who interested in such topics and are willing to endure some raw speculative trains of thought, may be interested in an essay I recently posted on goal-preservation in strongly self-modifying
systems, which is linked to from this blog post

http://multiverseaccordingtoben.blogspot.com/2008/08/on-preservation-of-goals-in-self.html

(I've got another, shorter post on AGI systems with superhuman empathy coming up ... but probably
not this weekend; I've got some real work to get through first ;-)


The following commentary appears on my blog at susaro.com.

**************************************************************

Preserving Goals in AI: First You Sort Out The Definitions, Then You Do The Math (not vice versa).

This is a brief commentary on Goetzel's paper entitled "Toward an Understanding of the Preservation of Goals in Self-Modifying Cognitive Systems", which is to be found at http://www.goertzel.org/papers/PreservationOfGoals.pdf

The purpose of his paper is to ask what happens when AI systems are given goals (like "Make humans happy") and also given the ability to modify their own design ... including the goals themselves.

In the following analysis I want to focus on the opening definition offered in that paper, and its relationship to everything that comes after it.

So, Goertzel begins by asking "What does it mean for system S to possess goal G over time interval T?".

His suggested answer to this question makes the assumption that there exists an "...observer O, who is hopefully a smart guy". With thisassumption in hand, he proposes the following definition, couched in terms of that observer:

"S possesses goal G over time interval T if, to O, it appears that the actions S takes during T are significantly more probable to lead to the maximization of G than random actions S would be able to take."

Let's analyze this.

The definition says that if we compare S's actions (in time interval T) - which set of actions I will henceforth call A[T] - with some randomly chosen set of alternative actions - which I will call A'[T], and if the comparison shows that A[T] are significantly more likely to lead to a maximization of G, than A'[T], then the system will be deemed to have possessed goal G.

But this entirely depends on the intellect and subjective judgement of the observer O. So much so, that we might as well say that S has the goal G if the observer believes that S has the goal G - which would be an empty definition.

More specifically, there at least five points on which the definition depends on subjectivity in the observer:

1) The actions A[T] have to be enumerated and compared with a random set A'[T] .... but 'actions' are not quantum entities, nor are they always obvious to an observer. What if the observer does not notice that a particular action occurred, either because it was too subtle or too abstract?

2) What is the meaning of a "significant" difference between the outcomes of the two sets of actions?

3) What exactly is the "maximization" of a goal? The best possible achievement of the goal in all possible worlds? Some goals can be stated only in qualitative terms, not quantitative terms, so the observer may be in a position of having to make subtle judgements about the relative merits of different kinds of goal achievement. Is it better to maximize happiness by making absolutely sure that no person ever experiences a degree of happiness that deviates from the average by more than 0.01%, on a particular measure? Or would maximization of happiness allow greater variance with a higher average? The more abstract the goal, the more meaningless it is to speak of its maximization.

4) And the actions must be judged, by the observer, to "lead" to the maximization of the goal. Who is to say how actions are causally connected to outcomes? Does a reduction in income tax for the wealthiest people lead to a 'trickle-down' increase in the overall wealth of the poor, or does it just lead to greater income disparity?

5) When exactly are the effects of the actions allowed to come into play? If the actions A[T] do not have a significant effect during time T, but then have a massive effect at [T + delta], does the observer ignore that fact and say that the system did not have goal G during time interval T (even though system S may verbally declare that it did have the goal, but did not expect it to have any result until later)?

Now, if the subsequent analysis offered by Goertzel, in this paper, were to result in a clarification of all these subjective aspects of the definition, then we might hope that the subjectivity was on the way to being reduced or eliminated. It is okay, I think, to start off with some vagueness in your definition if the math that comes later is designed to eliminate that vagueness in a believable way.

But this is not what hapens. Rather, the ideas embedded in the above definition (like what counts as a goal, and what counts as its maximization) are just left as primitives. The observer O is a crucial character in this paper, because O is supposed to be ... you and me. We are being asked to step into O's shoes and buy the idea that we all agree roughly what it means for a system to have a goal.

But nothing could be further from the truth: primitive ideas like "goal" and "maximization" are a million miles away from being cast in objective terms, and so the mathematical analysis that comes later is built on nothing firmer than quicksand. The primitive terms in that opening definition beg so many questions that the later analysis cannot be said to go anywhere at all.

Now, to be fair, Goertzel is sanguine about how much he has achieved in this paper, saying quite honestly that "Essentially nothing has been resolved in the above discussion. What I hope is that I have raised some interesting questions."

However, he then goes on to say that "My central goal here has been to replace vague conceptual questions about goal preservation in self-modifying systems with semantically similar questions that are at least somewhat more precise". With this I respectfully but firmly disagree: I believe that he started with concepts that were so subjective and vague as to be of no use at all, then built a mathematical apparatus on top of that insecure foundation.

In my book the first thing you do is sort out your definitions. Then you do the math. Not the other way around.

Sadly, the very existence of the mathematical apparatus that Goertzel proposes will serve to disguise the fact that all of our attention, right now, should be directed at the insecure foundations. In the literature as a whole, the concept of a "goal" is bandied about as if everyone understood that this was a moderately well-defined concept. In fact it is anything but. It is all well and good to have philosophical discussions in which we take a kind of Turing-esque, hands-off approach and say that a goal is just what a "reasonably smart guy" would judge to be a goal, but this kind of philosophical handwaving is not going to cut the mustard if real systems need to be designed to do real intelligent things.

We still await a calculus of goals and motivation that is founded on basic concepts that are not defined in terms of subjective observers or homunculi.


Richard Loosemore






-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com

Reply via email to