Hi Russell,
OK, I'll try to specify my ideas in this regard more clearly. Bear
in mind though that there are many ways to formalize an intuition,
and the style of formalization I'm suggesting here may or may not be
the "right" one. With this sort of thing, you only know if the
formalization is right after you've proved some theorems using it....
Given
-- an agent acting in an environment, with a variety of actions to
choose at various points in time
-- a specific goal G, in the context of which one is evaluating that
agent
one may define an "implicit expectation function" based on the
agent's chosen actions as compared to the goal G.
To wit: If in a certain situation S the agent chooses A instead of B,
and the agent is being evaluated as an achiever of goal G, then we
may say that according to the agent's implicit expectation function e
relative to goal-context G,
e( degree of achievement of G | taking action A in situation S) >
e( degree of achievement of G | taking action B in situation S)
For example, if I am being evaluated as an agent trying to create a
benevolent AGI, and I choose to write this email rather than complete
the edit of the Novamente design manuscript, this means that
according to Ben's implicit expectation function e relative to goal-
context "create benevolent AGI",
e( degree of achievement of "create benevolent AGI" | write this
email) >=
e( degree of achievement of "create benevolent AGI" | complete edit
of Novamente design manuscript)
Now, for a given agent that takes many actions, one will be able to
derive many such inequalities describing its implicit expectations
relative to the goal G.
Furthermore, the terms referred to in these inequalities are
generally going to be expressible as logical combinations of simpler
things. This allows them to be abstracted from, via probabilistic
reasoning.
For instance, a study of many of Ben's actions might reveal that he
likes writing more than editing, so that a general study of Ben's
behavior would yield the following abstract conclusion regarding
Ben's implicit expectation function:
e( degree of achievement of "create benevolent AGI" | write something) >
e( degree of achievement of "create benevolent AGI" | edit something)
Of course, this abstract conclusion may be wrong -- maybe Ben
actually likes editing more than writing, but happens to have been in
situations where he judged writing was the most important thing to do.
But, in this sort of manner, one can associate a set of "implicit
abstract expectations" regarding a system's behavior. This is a set
of abstractions describing the agent's apparent pattern of judgments,
obtained by analyzing the agent's observed action-selections in the
context of its known action-possibilities and the specific goal G.
Granted, different observers of the agent might come up with
different sets of implicit abstract expectations for the agent. But,
for sake of argument, let's assume an ideal probabilistic observer:
i.e., an observer who assesses the inequalities between the agent's
implicit abstract expectations using correct probability theory.
Naturally this may be a difficult computational problem, so we are
assuming this theoretical ideal probabilistic observer is very, very
smart.
Now, the implicit abstract expectations obtained by the ideal
probabilistic observer for the agent may relate to each other in
various ways. They may be completely consistent with each other, or
they may be wildly inconsistent with each other. That is: in the
view of the ideal probabilistic observer, the agent may behave
according to consistent implicit principles, or according to wildly
inconsistent implicit principles.
For instance, if Ben
-- always chooses writing in place of editing (when given a choice)
-- always chooses editing in place of golfing (when given a choice)
-- always chooses golfing in place of writing (when given a choice)
then Ben's implicit abstract expectations, as judged by an ideal
probabilistic observer, are going to come out as inconsistent.
However, there may be more specific information about what guides
Ben's choices that makes the apparent contradiction go away.
(Note that Ben could still be doing the right thing even if he were
apparently acting inconsistently according to these observed implicit
abstract expectations. But, if a sufficient amount of evidence has
been gathered about Ben, then if Ben is acting consistently an ideal
probabilistic observer should be able to create a consistent model of
his behavior by abstracting from his actions in the way I've described.)
Now, suppose this ideal probabilistic observer is also given the job
of making predictions of Ben's behaviors, based on the implicit
abstract expectations it has collected. One may then define the
**importance** of a particular implicit abstract expectation, in
terms of the degree of its tendency to play a useful role in accurate
predictions. (There are obvious formulas for quantifying this notion
of importance. We have a lot of experience with this way of defining
importance in our machine learning work in a bioinformatics context,
BTW.)
We may then ask: How consistent is the set of important implicit
abstract expectations associated with the agent?
My desire in this context is to show that, for agents that are
optimal or near-optimal at achieving the goal G under resource
restrictions R, the set of important implicit abstract expectations
associated with the agent (in goal-context G as assessed by an ideal
probabilistic observer) should come close to being consistent.
Clearly, this will hold only under certain assumptions about the
agent, the goal, and the resource restrictions, and I don't know what
these assumptions are.
The definition of "close to being consistent" is going to be critical
here, of course. Observed inconsistencies with little evidence
underlying them are going to have to be counted less than observed
inconsistencies with a lot of evidence underlying them, for example.
The crux of this result, if one were able to show it, would be that:
Under appropriate conditions, optimal goal-achieving systems behave
in a way that makes sense to a sufficiently intelligent observer.
Now, I agree that this is all kind of obvious, intuitively. But
"kind of obvious" doesn't mean "trivial to prove." Pretty much all
of Marcus Hutter's results about AIXI are kind of obvious too,
perhaps even more so than the hypotheses I've made above -- it's
intuitively quite clear that AIXI can achieve an arbitrarily high
level of intelligence, and that it can perform just as well as any
other algorithm up to a (large) constant factor. Yet, to prove this
rigorously turned out to be quite a pain, given the mathematical
tools at our disposal, as you can see from the bulk and complexity of
Hutter's papers.
-- Ben G
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303