Hi Russell,

OK, I'll try to specify my ideas in this regard more clearly. Bear in mind though that there are many ways to formalize an intuition, and the style of formalization I'm suggesting here may or may not be the "right" one. With this sort of thing, you only know if the formalization is right after you've proved some theorems using it....

Given

-- an agent acting in an environment, with a variety of actions to choose at various points in time -- a specific goal G, in the context of which one is evaluating that agent

one may define an "implicit expectation function" based on the agent's chosen actions as compared to the goal G.

To wit: If in a certain situation S the agent chooses A instead of B, and the agent is being evaluated as an achiever of goal G, then we may say that according to the agent's implicit expectation function e relative to goal-context G,

e( degree of achievement of G | taking action A in situation S) >
e( degree of achievement of G | taking action B in situation S)

For example, if I am being evaluated as an agent trying to create a benevolent AGI, and I choose to write this email rather than complete the edit of the Novamente design manuscript, this means that according to Ben's implicit expectation function e relative to goal- context "create benevolent AGI",

e( degree of achievement of "create benevolent AGI" | write this email) >= e( degree of achievement of "create benevolent AGI" | complete edit of Novamente design manuscript)

Now, for a given agent that takes many actions, one will be able to derive many such inequalities describing its implicit expectations relative to the goal G.

Furthermore, the terms referred to in these inequalities are generally going to be expressible as logical combinations of simpler things. This allows them to be abstracted from, via probabilistic reasoning.

For instance, a study of many of Ben's actions might reveal that he likes writing more than editing, so that a general study of Ben's behavior would yield the following abstract conclusion regarding Ben's implicit expectation function:

e( degree of achievement of "create benevolent AGI" | write something) >
e( degree of achievement of "create benevolent AGI" | edit something)

Of course, this abstract conclusion may be wrong -- maybe Ben actually likes editing more than writing, but happens to have been in situations where he judged writing was the most important thing to do.

But, in this sort of manner, one can associate a set of "implicit abstract expectations" regarding a system's behavior. This is a set of abstractions describing the agent's apparent pattern of judgments, obtained by analyzing the agent's observed action-selections in the context of its known action-possibilities and the specific goal G.

Granted, different observers of the agent might come up with different sets of implicit abstract expectations for the agent. But, for sake of argument, let's assume an ideal probabilistic observer: i.e., an observer who assesses the inequalities between the agent's implicit abstract expectations using correct probability theory. Naturally this may be a difficult computational problem, so we are assuming this theoretical ideal probabilistic observer is very, very smart.

Now, the implicit abstract expectations obtained by the ideal probabilistic observer for the agent may relate to each other in various ways. They may be completely consistent with each other, or they may be wildly inconsistent with each other. That is: in the view of the ideal probabilistic observer, the agent may behave according to consistent implicit principles, or according to wildly inconsistent implicit principles.

For instance, if Ben
-- always chooses writing in place of editing (when given a choice)
-- always chooses editing in place of golfing (when given a choice)
-- always chooses golfing in place of writing (when given a choice)

then Ben's implicit abstract expectations, as judged by an ideal probabilistic observer, are going to come out as inconsistent. However, there may be more specific information about what guides Ben's choices that makes the apparent contradiction go away.

(Note that Ben could still be doing the right thing even if he were apparently acting inconsistently according to these observed implicit abstract expectations. But, if a sufficient amount of evidence has been gathered about Ben, then if Ben is acting consistently an ideal probabilistic observer should be able to create a consistent model of his behavior by abstracting from his actions in the way I've described.)

Now, suppose this ideal probabilistic observer is also given the job of making predictions of Ben's behaviors, based on the implicit abstract expectations it has collected. One may then define the **importance** of a particular implicit abstract expectation, in terms of the degree of its tendency to play a useful role in accurate predictions. (There are obvious formulas for quantifying this notion of importance. We have a lot of experience with this way of defining importance in our machine learning work in a bioinformatics context, BTW.)

We may then ask: How consistent is the set of important implicit abstract expectations associated with the agent?

My desire in this context is to show that, for agents that are optimal or near-optimal at achieving the goal G under resource restrictions R, the set of important implicit abstract expectations associated with the agent (in goal-context G as assessed by an ideal probabilistic observer) should come close to being consistent.

Clearly, this will hold only under certain assumptions about the agent, the goal, and the resource restrictions, and I don't know what these assumptions are.

The definition of "close to being consistent" is going to be critical here, of course. Observed inconsistencies with little evidence underlying them are going to have to be counted less than observed inconsistencies with a lot of evidence underlying them, for example.

The crux of this result, if one were able to show it, would be that: Under appropriate conditions, optimal goal-achieving systems behave in a way that makes sense to a sufficiently intelligent observer.

Now, I agree that this is all kind of obvious, intuitively. But "kind of obvious" doesn't mean "trivial to prove." Pretty much all of Marcus Hutter's results about AIXI are kind of obvious too, perhaps even more so than the hypotheses I've made above -- it's intuitively quite clear that AIXI can achieve an arbitrarily high level of intelligence, and that it can perform just as well as any other algorithm up to a (large) constant factor. Yet, to prove this rigorously turned out to be quite a pain, given the mathematical tools at our disposal, as you can see from the bulk and complexity of Hutter's papers.

-- Ben G

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Reply via email to