This paper (and his PhD thesis that I've not yet located) was, according to
the author, motivated in part by Jan Leike's PhD thesis "Nonparametric
General Reinforcement Learning <https://arxiv.org/abs/1611.08944>" under
Marcus Hutter that (supposedly) undercuts lossless compression as the gold
standard information criterion for predictive model selection.  In
particular, Leike's paper's abstract contains this passage:

We establish negative results on Bayesian RL agents, in particular AIXI. We
> show that unlucky or adversarial choices of the prior cause the agent to
> misbehave drastically. Therefore Legg-Hutter intelligence and balanced
> Pareto optimality, which depend crucially on the choice of the prior, are
> entirely subjective.


Invoking the word "prior" here is confusing.  Both Bennett and Leike are
pursuing optimal reward (what I refer to as
decision/judgement/technology/engineering/etc.) rather than optimal
learning (learning/natural science/research/etc.).  In the former "prior"
entails the utility function that maps observation onto reward.  In the
latter "prior" entails only choice of UTM/programming language.

While I understand that they are attempting to deal with the reality of
multi-agent environments -- including self-modeling -- and that therefore
it is therefore necessary to have a "theory of mind" that entails not only
meta-modeling of other agents' world models, but also other agents' utility
functions, it seems to me that these papers have muddied the waters by
conflating the two senses of "prior" listed above.

This is particularly concerning in the case of Jan Leike, as he is now
occupying a *very* prominent place among the industry's "alignment"
authorities.

On Fri, Jan 26, 2024 at 10:37 AM James Bowery <jabow...@gmail.com> wrote:

> The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest
> <https://arxiv.org/abs/2301.12987>
> Michael Timothy Bennett
> <https://arxiv.org/search/cs?searchtype=author&query=Bennett,+M+T>
>
> If A and B are sets such that A⊂B, generalisation may be understood as
> the inference from A of a hypothesis sufficient to construct B. One might
> infer any number of hypotheses from A, yet only some of those may
> generalise to B. How can one know which are likely to generalise? One
> strategy is to choose the shortest, equating the ability to compress
> information with the ability to generalise (a proxy for intelligence). We
> examine this in the context of a mathematical formalism of enactive
> cognition. We show that compression is neither necessary nor sufficient to
> maximise performance (measured in terms of the probability of a hypothesis
> generalising). We formulate a proxy unrelated to length or simplicity,
> called weakness. We show that if tasks are uniformly distributed, then
> there is no choice of proxy that performs at least as well as weakness
> maximisation in all tasks while performing strictly better in at least one.
> In experiments comparing maximum weakness and minimum description length in
> the context of binary arithmetic, the former generalised at between 1.1
>  and 5 times the rate of the latter. We argue this demonstrates that
> weakness is a far better proxy, and explains why Deepmind's Apperception
> Engine is able to generalise effectively.
>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T78fb8d90b9a51bf0-Md945d926658f237d5b5076e4
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to