Re: [agi] Breaking AIXI-tl

Eliezer S. Yudkowsky Thu, 13 Feb 2003 18:48:00 -0800

Ben Goertzel wrote:
> Eliezer,
>
>> A (selfish) human upload can engage in complex cooperative strategies
>> with an exact (selfish) clone, and this ability is not accessible to
>> AIXI-tl, since AIXI-tl itself is not tl-bounded and therefore cannot
>> be simulated by AIXI-tl, nor does AIXI-tl have any means of
>> abstractly representing the concept "a copy of myself". Similarly,
>> AIXI is not computable and therefore cannot be simulated by AIXI.
>> Thus both AIXI and AIXI-tl break down in dealing with a physical
>> environment that contains one or more copies of them. You might say
>> that AIXI and AIXI-tl can both do anything except recognize
>> themselves in a mirror.
>
> I disagree with the bit about 'nor does AIXI-tl have any means of
> abstractly representing the concept "a copy of myself".'
>
> It seems to me that AIXI-tl is capable of running programs that contain
> such an abstract representation. Why not? If the parameters are
> right, it can run programs vastly more complex than a human brain
> upload...
>
> For example, an AIXI-tl can run a program that contains the AIXI-tl
> algorithm, as described in Hutter's paper, with t and l left as free
> variables. This program can then carry out reasoning using predicate
> logic, about AIXI-tl in general, and about AIXI-tl for various values
> of t and l.
>
> Similarly, AIXI can run a program that contains a mathematical
> description of AIXI similar to the one in Hutter's paper. This program
> can then prove theorems about AIXI using predicate logic.
>
> For instance, if AIXI were rewarded for proving math theorems about
> AGI, eventually it would presumably learn to prove theorems about AIXI,
> extending Hutter's theorems and so forth.

Yes, AIXI can indeed prove theorems about AIXI better than any human. AIXI-tl can prove theorems about AIXI-tl better than any tl-bounded human. AIXI-tl can model AIXI-tl as well as any tl-bounded human. AIXI-tl can model a tl-bounded human, say Lee Corbin, better than any tl-bounded human; given deterministic physics it's possible AIXI-tl can model Lee Corbin better than Lee Corbin (although I'm not quite as sure of this). But AIXI-tl can't model an AIXI-tl in the same way that a Corbin-tl can model a Corbin-tl. See below.

>> The simplest case is the one-shot Prisoner's Dilemna against your own
>> exact clone. It's pretty easy to formalize this challenge as a
>> computation that accepts either a human upload or an AIXI-tl. This
>> obviously breaks the AIXI-tl formalism. Does it break AIXI-tl? This
>> question is more complex than you might think. For simple problems,
>> there's a nonobvious way for AIXI-tl to stumble onto incorrect
>> hypotheses which imply cooperative strategies, such that these
>> hypotheses are stable under the further evidence then received. I
>> would expect there to be classes of complex cooperative problems in
>> which the chaotic attractor AIXI-tl converges to is suboptimal, but I
>> have not proved it. It is definitely true that the physical problem
>> breaks the AIXI formalism and that a human upload can
>> straightforwardly converge to optimal cooperative strategies based on
>> a model of reality which is more correct than any AIXI-tl is capable
>> of achieving.
>>
>> Ultimately AIXI's decision process breaks down in our physical
>> universe because AIXI models an environmental reality with which it
>> interacts, instead of modeling a naturalistic reality within which it
>> is embedded. It's one of two major formal differences between AIXI's
>> foundations and Novamente's. Unfortunately there is a third
>> foundational difference between AIXI and a Friendly AI.
>
> I don't agree at all.
>
> In a Prisoner's Dilemma between two AIXI-tl's, why can't each one run a
> program that:
>
> * uses an abstract mathematical representation of AIXI-tl, similar to
> the one given in the Hutter paper * use predicate logic to prove
> theorems about the behavior of the other AIXI-tl

Because AIXI-tl is not an entity deliberately allocating computing power; its control process is fixed. AIXI-tl will model a process that proves theorems about AIXI-tl only if that process is the best predictor of the environmental information seen so far.

Let's say the primary AIXI-tl, the one whose performance we're tracking, is facing a complex cooperative problem. Within each round, the challenge protocol is as follows.

1) The Primary testee is cloned - that is, the two testees are resynchronized at the start of each new round. This is why Lee Corbin is the human upload (i.e., to avoid moral issues). We will assume that the Secondary testee, if a human upload, continues to attempt to maximize rational reward despite impending doom; again, this is why we're using Lee Corbin.

2) Each party, the Primary and the Secondary (the Secondary being re-cloned on each round) are shown an identical map of the next cooperative complex problem. For example, this might be a set of billiards, a complex table, pockets that score different amounts of points, with the Primary's billiards colored green and the Secondary's billiards colored blue. However, neither party is told which party they are during this stage.

3) Each party is flashed a green or blue screen telling them whether they are Primary or Secondary.

4) Each party has the opportunity to input a set of initial velocities for their billiards.

5) Each party is shown the billiards problem playing out. (Strictly speaking this step is optional, as the reward can act as the sole source of feedback, but it simplifies the conceptual description of the scenario.)

6) Each party receives a reward proportional to the sum of the points scored in the pockets their billiards sank into, bearing in mind that different pockets score different numbers of points.

We'll assume that the table configuration is such as to require cooperative collisions in order to reach the highest pockets. This probably isn't really such a good example of a cooperative problem, but it gives a general picture.

Lee Corbin can work out his entire policy in step (2), before step (3) occurs, knowing that his synchronized other self - whichever one he is - is doing the same. Thus (both of) Lee Corbin will tend to work out policies which are fair but which maximize reward for both parties. Then in step (4) Corbin just implements the actions already decided on, unless he starts deciding to defect against himself, which is something he'll have to work out on his own. There is a purely rational solution for this philosophical problem which formalizes Hofstadterian (selfish) superrationality, but we may assume Corbin just cooperates with himself on instinct; AIXI-tl is supposed to outperform *any* tl-bounded algorithm. Similarly, we will assume Corbin has been told the experimental setup beforehand and that this forms part of his initial state. Presuming that the billiard problems are tractable for Corbin, he should score very well.

What happens to AIXI-tl?

AIXI-tl first has the opportunity to produce an action in round 4. At this time the Primary and Secondary already have different information; they saw a different screen flash and were shown a different set of billiards awaiting input. AIXI-tl's reasoning now can be roughly understood... if *I've* understood it correctly... as effectively (a) taking the size 2^l set of l-bounded programs, (b) treating these programs as probability measures over all possible inputs, (c) Bayesian-updating the posterior probabilities of all programs given the actual observed inputs, (d) using this posterior probability to weight those programs' predictions of rewards given various possible outputs, and (e) choosing a strategy which maximizes reward over [horizon] rounds. I may have gotten wrong my understanding of exactly what AIXI-tl is doing; if so, though, it doesn't seem likely that it would affect the major point made below.

The major point is as follows: AIXI-tl is unable to arrive at a valid predictive model of reality because the sequence of inputs it sees, on successive rounds, are being produced by AIXI-tl trying to model the inputs using tl-bounded programs, while in fact those inputs are really the outputs of the non-tl-bounded AIXI-tl. If a tl-bounded program correctly predicts the inputs seen so far, it will be using some inaccurate model of the actual reality, since no tl-bounded program can model the actual computational process AIXI-tl uses to select outputs. A tl-bounded program (like myself) can *reason abstractly about* properties of AIXI-tl but not actually *simulate* AIXI-tl well enough to produce its output as a prediction. This problem gets worse and worse as AIXI-tl reasons harder and harder on its alternate selves' outputs-as-inputs and produces future outputs which are compoundedly less and less computable for tl-bounded programs.

This chaotic iterative process may have an attractor in which a predictive model suggests an output strategy for Secondaries which confirms the model when seen as inputs by the Primary. Note, however, that while Corbin's cooperative strategy is self-confirming, not all self-confirming strategies are cooperative - the Always-D strategy in the one-shot Prisoner's Dilemna is also self-confirming. Meanwhile, Always C strategies are self-confirming and produce rewards equalling or exceeding Corbin's score, but this predictive model is not stable for AIXI-tl under the test conditions - if AIXI-tl predicts that the opponent always cooperates, it will attempt to defect! The chaotic process producing AIXI-tl's strategy, if any, cannot be understood as analogous to Corbin working out the optimal strategy with his synchronized other self.

For very simple problems AIXI-tl may arrive at self-confirming *and* stable *and* cooperative strategies, such as the one-shot Prisoner's Dilemna and Tit for Tat. I would expect that for any complex cooperative problem, AIXI-tl's inaccurate modeling process, iterating over its own feedback, eventually converges to an attractor, a (false) model which is both self-confirming and game-theoretically stable. I would expect that there are many such attractors in complex cooperative problems and no reason why AIXI-tl would successfully hit the optimal cooperative strategy (which may not even be stable given AIXI-tl's modeling process) - given that stability and self-confirmation are both fundamentally different criteria from cooperative optimality.

> How is this so different than what two humans do when reasoning about
> each others' behavior? A given human cannot contain within itself a
> detailed model of its own clone; in practice, when a human reasons
> about the behavior of it clone, it is going to use some abstract
> representation of that clone, and do some precise or uncertain
> reasoning based on this abstract representation.

Humans can use a naturalistic representation of a reality in which they are embedded, rather than being forced like AIXI-tl to reason about a separated environment; consequently humans are capable of rationally reasoning about correlations between their internal mental processes and other parts of reality, which is the key to the complex cooperation problem with your own clone - the realization that you can actually *decide* your clone's actions in step (2), if you make the right agreements with yourself and keep them.

To sum up:

(a) The fair, physically realizable challenge of cooperation with your clone immediately breaks the AIXI and AIXI-tl formalisms.

(b) This happens because of a hidden assumption built into the formalism, wherein AIXI devises a Cartesian model of a separated environmental theatre, rather than devising a model of a naturalistic reality that includes AIXI.

(c) There's no obvious way to repair the formalism. It's been diagonalized, and diagonalization is usually fatal. The AIXI homunculus relies on perfectly modeling the environment shown on its Cartesian theatre; a naturalistic model includes the agent itself embedded in reality, but the reflective part of the model is necessarily imperfect (halting problem).

(d) It seems very likely (though I have not actually proven it) that in addition to breaking the formalism, the physical challenge actually breaks AIXI-tl in the sense that a tl-bounded human outperforms it on complex cooperation problems.

(e) This conjectured outperformance reflects the human use of a type of rational (Bayesian) reasoning apparently closed to AIXI, in that humans can reason about correlations between their internal processes and distant elements of reality, as a consequence of (b) above.

--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Re: [agi] Breaking AIXI-tl

Reply via email to