subject:"Re\: Predictions \& duplications"

Re: Predictions & duplications

2001-10-30 Thread Marchal


I comment again a relatively old post by Juergen Schmidhuber.


>Juergen: The thing is: when you generate them all, and assume that all are
>equally likely, in the sense that all beginnings of all strings are
>uniformly distributed, then you cannot explain why the regular universes 
>keep being regular.  The random futures are then just as likely as the 
>nonrandom ones.
>So you NEED something additional to explain the ongoing regularity.
>You need something like the Speed Prior, which greatly favors regular 
>futures over others.
>
>> Bruno: Take for exemple the iterated duplication experience/experiment. 
>> It can be automated by a very simple program. After 64 iterations
>> there will be 1,84467.10^19 agents, and most of them will have
>> an incompressible 01-history. With the comp indeterminism it is
>> much more likely you will get such a really random string 
>> (independently of the fact that you will be unable to prove it is
>> really random). Those with computable strings will be exceptional, so
>> that, if those agents work together they will consider (even with
>> Bayes) that the simple self-multiplying algorithm is the simplest
>> and shorter explanation, for those randomeness appearances.
>
>Juergen: But don't you see? Why does a particular agent, say, 
>yourself, with a
>nonrandom past, have a nonrandom future? 


But we have random future. Just send a sequence of particles in the state
1/sqrt(2)( up + down ) through an "analyser". Write 0 and 1 each time
a particular particle go through. Both QM+comp (Everett) or comp
alone (I will not elaborate !) explain this first person (plural) point
of view randomization by a relative
self-multiplication/division/differentiation. 
As I said the simple self-multiplying algorithm is the simplest and 
shorter
explanation, for those randomness appearances.


>Why is your computer still there
>after one second, although in a truly random world it would immediately
>dissolve? 


No one says reality is *only* a truly random
realm, especially when third person reality is given by UD*, the trace
of the Universal Dovetailer. Arithmetically it is just the set of true
\sigma_1 sentences. That's our (third person) atomic truth. Of course
those truth are "UD" provable.


>Why do pencils keep falling down instead of up, when the
>futures where they fall up are just as likely?


Because those sets of machine accessible arithmetical truth, as viewed
by the machines themselves (this introduce the modalities) is
highly structured.

Indeed the UD has this nasty habit of dovetailing on the reals, so that 
some
randomisation is at work. But some equilibrium between randomization
and local lawfullness has to be made in the limit (where first person 
point of  views are eventually defined in Plato Heaven). 
Our own stability could
only rely on the randomization of the details of our stories,
randomisation which we should necessarily observe if we look at ourself
*below* our (apparently common) substitution level. Although
this gives a sort of necessary prior (need of quantisation of the 
"classical
stories", transforming H into exp(-iH)), I prefer, following my naive idea
to interview directly the sound Universal Machine.

I define "probability of p = 1" in (Peano) Arithmetic by Bew('p) & 
Con('p).
In this way p is true in all consistent extensions (Bew ('p)) and there
is (the bet part) at least some consistent extension (Con('p)).
I restrict the arithmetical interpretation p to the Arithmetical Sigma_1
sentences (the aritmetical UD). This gives an arithmetical
quantization of p by []<>p, (with []p = Bew('p) & Con('p)).   
It indeed obeys a sort of quantum logic.
As alwaysCon 'p  ( <>p )  =  -Bew('-p) ( -[]-p ).


But I fear you don't believe in any form of "strict" indeterminism, 
neither
comp nor QM, isn't it?


Bruno

Re: Predictions & duplications

2001-10-29 Thread Russell Standish


Juergen Schmidhuber wrote:
> 
> > > > From: Russell Standish <[EMAIL PROTECTED]>
> > > > The only reason for not accepting the simplest thing is if it can be
> > > > shown to be logically inconsistent. This far, you have shown no such
> > > > thing, but rather demonstrated an enormous confusion between measure
> > > > and probability distribution.
> > >
> > > Russell, this is really too much; please read any intro to measure
> > > theory,
> > > e.g., the one by Halmos.  Measures in general are _not_ probability
> > > distributions.  They are more than that; you can view them as _sets_ of
> > > probability distributions on sets that are related to each other in a
> > > specific way. Probability density functions define measures and
> > > therefore
> > > in general aren't probability distributions either. The uniform PDF on
> > > [0..1] yields the perfect trivial example; that's exactly why I used it
> > > before.  In the computability context, compare LI/Vitanyi 1997, chapters
> > > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
> > > (continuous sample space with (semi)measure M(x)).
> >
> > This is an excellent reference, thank you. Example 4.2.1 gives the
> > construction of the uniform measure over the set of strings, precisely
> > what you claim didn't exist when we started this discussion.
> 
> You just want to annoy me, don't you?  For the last time: There is a
> uniform *measure* on the strings, but no uniform probability
> distribution.

Who is annoying who? I never claimed a uniform probability density (I
actually looked up the difference between probability density and
probability distribution yesterday in Li and Vitanyi) - only a uniform
measure. In fact, my first comment to you was "Why do you insist on it
being a PDF?". Remember?

The simplest prior is the uniform measure on infinite strings. 

> 
> > > > 1) Halting theorem implies the set of halting programs is of measure
> > > > zero within the set of all programs
> > >
> > > You mean, given a uniform measure? But why should you need the halting
> > > theorem for this trivial statement?  In fact, what exactly do you mean
> > > by
> > > halting theorem?  (Schoenfield's book provides a good intro to
> > > mathematical
> > > logic and computability theory.)
> > >
> >
> > The halting theorem is that there is no algorithm for predicting
> > whether any program will halt. Of course, the result I was referring
> > to is that the set of compressible strings is of measure zero. The set
> > of halting programs actually has finite measure, since \Omega >
> > 0. However, there is still finite probability of an input string
> > being nonhalting, and indeed the probability seems to be greater than
> > 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still
> > presents a problem for computationalism.
> 
> This is mixing up everything, in particular, different measures.
> 1) Infinite strings with finite program have *nonzero* measure, given
> the *universal* measure (Omega is a number derived from the universal
> measure).  2) They have measure zero, given the *uniform* measure (Omega
> has nothing to do with it). 

The uniform measure and the universal measure are related. The uniform
prior is on the set of input strings to a UTM, the universal measure
is the induced measure on the output strings.

No. 1) here is the relevant statement, not 2).

 3) To see 2) we do not need the halting
> theorem - we just compare the finite programs (countably many) to all
> strings (uncountably many) and we are done. 4) The halting theorem is
> no problem whatsoever when it comes to computable universes.  Why care
> whether some computable universe stops or not?  If it does, it does,
> otherwise it does not. In both cases we can compute it in the limit.
> 
> > > > The GP program (aka Universal Dovetailer) is not the simplest
> > > > thing. The simplest describable thing is the set of all descriptions,
> > > > that simply exist without being generated. That has precisely zero
> > > > information or complexity, whereas the UD has some complexity of the
> > > > order of a few 100 bits. The simplest prior is the uniform one, ie
> > > > there is no bias whatsoever in the selection.
> > >
> > > Exist without being generated?  Precisely zero information or
> > > complexity? But with respect to what?
> >
> > With respect to the observer. With the respect to anything in
> > fact. The set of all possible descriptions has precisely zero
> > information, by definition, regardless of what observer or context you
> > employ.
> 
> This is a vague informal statement, not a definition. Which are the
> possible descriptions? What's the formal difference between a
> description
> and a non-description?  What is your "anything" if there is no device
> for describing it formally? If "anything" does not convey any bit of
> info then what exactly is it that conveys one bit? Or two?
> 

Choose an alphabet (a finite set of symbols). The set of all
descripti

Re: Predictions & duplications

2001-10-29 Thread Juergen Schmidhuber


> > > From: Russell Standish <[EMAIL PROTECTED]>
> > > The only reason for not accepting the simplest thing is if it can be
> > > shown to be logically inconsistent. This far, you have shown no such
> > > thing, but rather demonstrated an enormous confusion between measure
> > > and probability distribution.
> >
> > Russell, this is really too much; please read any intro to measure
> > theory,
> > e.g., the one by Halmos.  Measures in general are _not_ probability
> > distributions.  They are more than that; you can view them as _sets_ of
> > probability distributions on sets that are related to each other in a
> > specific way. Probability density functions define measures and
> > therefore
> > in general aren't probability distributions either. The uniform PDF on
> > [0..1] yields the perfect trivial example; that's exactly why I used it
> > before.  In the computability context, compare LI/Vitanyi 1997, chapters
> > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
> > (continuous sample space with (semi)measure M(x)).
>
> This is an excellent reference, thank you. Example 4.2.1 gives the
> construction of the uniform measure over the set of strings, precisely
> what you claim didn't exist when we started this discussion.

You just want to annoy me, don't you?  For the last time: There is a
uniform *measure* on the strings, but no uniform probability
distribution.

> > > 1) Halting theorem implies the set of halting programs is of measure
> > > zero within the set of all programs
> >
> > You mean, given a uniform measure? But why should you need the halting
> > theorem for this trivial statement?  In fact, what exactly do you mean
> > by
> > halting theorem?  (Schoenfield's book provides a good intro to
> > mathematical
> > logic and computability theory.)
> >
>
> The halting theorem is that there is no algorithm for predicting
> whether any program will halt. Of course, the result I was referring
> to is that the set of compressible strings is of measure zero. The set
> of halting programs actually has finite measure, since \Omega >
> 0. However, there is still finite probability of an input string
> being nonhalting, and indeed the probability seems to be greater than
> 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still
> presents a problem for computationalism.

This is mixing up everything, in particular, different measures.
1) Infinite strings with finite program have *nonzero* measure, given
the *universal* measure (Omega is a number derived from the universal
measure).  2) They have measure zero, given the *uniform* measure (Omega
has nothing to do with it).  3) To see 2) we do not need the halting
theorem - we just compare the finite programs (countably many) to all
strings (uncountably many) and we are done. 4) The halting theorem is
no problem whatsoever when it comes to computable universes.  Why care
whether some computable universe stops or not?  If it does, it does,
otherwise it does not. In both cases we can compute it in the limit.

> > > The GP program (aka Universal Dovetailer) is not the simplest
> > > thing. The simplest describable thing is the set of all descriptions,
> > > that simply exist without being generated. That has precisely zero
> > > information or complexity, whereas the UD has some complexity of the
> > > order of a few 100 bits. The simplest prior is the uniform one, ie
> > > there is no bias whatsoever in the selection.
> >
> > Exist without being generated?  Precisely zero information or
> > complexity? But with respect to what?
>
> With respect to the observer. With the respect to anything in
> fact. The set of all possible descriptions has precisely zero
> information, by definition, regardless of what observer or context you
> employ.

This is a vague informal statement, not a definition. Which are the
possible descriptions? What's the formal difference between a
description
and a non-description?  What is your "anything" if there is no device
for describing it formally? If "anything" does not convey any bit of
info then what exactly is it that conveys one bit? Or two?

>  Any traditional definition of
> > information/simplicity requires the specification of a formal
> > description
> > method, such as a Turing machine. You don't need one? Then how do you
> > define information or complexity? How exactly do you define "simple" ?
>
> Actually, all it needs is an equivalencing procedure. See my "On
> Complexity and Emergence" paper. A Turing machine will do the job for
> you, but so will a neural network or an "expert system" following
> heuristic rules.

So you have to write down formally this equivalencing procedure, right?
Why should this be different in principle from writing down the formal
rules for a Turing machine? Why is a simplicity measure that depends on
your equivalencing procedure superior to a simplicity measure depending
on a Turing machine?  (The TM procedure includes yours - because it
includes all procedu

Re: Predictions & duplications

2001-10-29 Thread Marchal

Schmidhuber wrote:

>Why care for the subset of provable sentences? 
>Aren't we interested in the full set of all describable sentences?

We are interested in the true sentences. The provable one and the
unprovable one.

>We can generate it, without caring for proofs at all.

If you mean generate all describable *sentences* , this is trivial
but then you generate only a language.

If you mean generate all computations, it is equivalent with generating
the proofs of all \Sigma_1 sentences (= the "arithmetical" UD).
The provability predicate is just a universal \Sigma_1 sentence.

(I recall a \Sigma_1 sentence is a sentence equivalent to (Ex)P(x) with
P algorithmicaly decidable).

>"Unspeakable" means "beyond formal definition."

Beyond turing nameability?
Beyond formal definition by whom? You know the universal sound
machine has no mean to define its own truth predicate (Tarski). Is it 
in that sense? The knowledge of p by the machine can be define (in a first
approximation) by "p & Bew('p)", in which case, although thoroughly
clear at the metalevel, the knowledge *by* the machine is beyound
any formal definition *for* the machine.

Bew = Godel provability predicate, 'p = the godel number of "p".
Bew(x) = (Ey)B(y,x) with B(y,x) means y is the godel number of a proof
of a sentence with godel number x.

>In particular, the
>way we humans communicate and agree on a common formal language is not
>formally defined.

OK. Most importantly the very meaning of "finite" or number, cannot
be formally defined. But I show that as far as we are sound universal
machine, or own knowledgeability "predicate" cannot be formally defined
by us.

>You write "x" and I say, ok, it's an "x", but that's
>all based on some sort of informal agreement that involves complex
>pattern recognition and learning processes. Usually we do not question
>the validity of outcomes of such cognitive processes, and just go ahead
>with formal derivations.

Yes, but the situation is worst for the very *meaning* of our
utterances.

>But there is this major informal step *before*
>formal reasoning can start, and good textbooks on logic acknowledge
>this.

There is *this* one, but there is a deeper one with the *meaning*
of our formal presentations. We really bet we share a common intuition
on the notion of finitary procedure.
Note that intuitionist and classicist diverge beyond natural numbers.

We are not on the same lenghtwave Juergen. You believe there
is a computable universe. I am agnostic about that. But if I survive
or (just experience nothing) through a digital functionnal substitution
made at some level L then my experience is defined by the set of all 
consistent extensions defined by all consistent histories definissable
below L. With comp realities emerge from the way consistent machine's
memories organise themselves (through the many many histories going
through their states).

Your speed prior would be useful if it shows how it enhances
the weight of normal stories in the limit first person experiences
(due to unawareness of the reconstitution delays).
Note that quantum quantization of classical theories provide such
an explanation. I show comp need to extract that quantization from
a measure on the consistent extensions. I show the "probability one"
obeys the main axioms of quantum logic.

You ask me often why I give so much importance to the notion
of provability. The reason is that in some sense I just interview
scientific machines, as they can appear in the GP's work, about what
they are able to prove about their own consistent extensions, and how
and why they can arrive to probability consisderation. And "provability"
is enough non trivial for providing clues on the minimal necessary
logical structures on that set of consistent extensions.

Bruno

Re: Predictions & duplications

2001-10-28 Thread Russell Standish

Juergen Schmidhuber wrote:
> > From: Russell Standish <[EMAIL PROTECTED]>
> > The only reason for not accepting the simplest thing is if it can be
> > shown to be logically inconsistent. This far, you have shown no such
> > thing, but rather demonstrated an enormous confusion between measure
> > and probability distribution.
> 
> Russell, this is really too much; please read any intro to measure
> theory,
> e.g., the one by Halmos.  Measures in general are _not_ probability
> distributions.  They are more than that; you can view them as _sets_ of
> probability distributions on sets that are related to each other in a
> specific way. Probability density functions define measures and
> therefore
> in general aren't probability distributions either. The uniform PDF on
> [0..1] yields the perfect trivial example; that's exactly why I used it
> before.  In the computability context, compare LI/Vitanyi 1997, chapters
> 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
> (continuous sample space with (semi)measure M(x)).

This is an excellent reference, thank you. Example 4.2.1 gives the
construction of the uniform measure over the set of strings, precisely
what you claim didn't exist when we started this discussion.

> 
> > 1) Halting theorem implies the set of halting programs is of measure
> > zero within the set of all programs
> 
> You mean, given a uniform measure? But why should you need the halting
> theorem for this trivial statement?  In fact, what exactly do you mean
> by
> halting theorem?  (Schoenfield's book provides a good intro to
> mathematical
> logic and computability theory.)
> 

The halting theorem is that there is no algorithm for predicting
whether any program will halt. Of course, the result I was referring
to is that the set of compressible strings is of measure zero. The set
of halting programs actually has finite measure, since \Omega >
0. However, there is still finite probability of an input string
being nonhalting, and indeed the probability seems to be greater than
0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still
presents a problem for computationalism.

> > The GP program (aka Universal Dovetailer) is not the simplest
> > thing. The simplest describable thing is the set of all descriptions,
> > that simply exist without being generated. That has precisely zero
> > information or complexity, whereas the UD has some complexity of the
> > order of a few 100 bits. The simplest prior is the uniform one, ie
> > there is no bias whatsoever in the selection.
> 
> Exist without being generated?  Precisely zero information or
> complexity? But with respect to what? 

With respect to the observer. With the respect to anything in
fact. The set of all possible descriptions has precisely zero
information, by definition, regardless of what observer or context you
employ.

 Any traditional definition of
> information/simplicity requires the specification of a formal
> description
> method, such as a Turing machine. You don't need one? Then how do you
> define information or complexity? How exactly do you define "simple" ?
> 

Actually, all it needs is an equivalencing procedure. See my "On
Complexity and Emergence" paper. A Turing machine will do the job for
you, but so will a neural network or an "expert system" following
heuristic rules. 

Perhaps the simplest formal approach is say that the equivalencing procedure
is a function from the set of descriptions onto the integers
(labelling the equivalence classes). If that function is partial
recursive, then the equivalencing mechanism can be replaced by the
appropriate Turing machine. Computationalism is the creed that this
function must be partial recursive, but it is not necessary for the
definition of information or complexity.

I can see some subtle problems occurring if this equivalencing
procedure is stochastic in nature, however by summing over the
appropriate distributions, one should still end up with meaningful
results provided the variance is not too large.

Cheers

Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

Re: Predictions & duplications

2001-10-26 Thread Juergen Schmidhuber


Schmidhuber:

>>It's the simplest thing, given this use of mathematical
>>language we have agreed upon. But here the power of the
>>formal approach ends - unspeakable things remain unspoken.

Marchal:

>I disagree. I would even say that it is here that the serious formal
>approach begins. Take "unprovable" for "unspeakable", we can
>meta-reason (informally or formally) and study the logical structure
>of the set of unprovable sentences by sound universal machines.

Schmidhuber:

Unprovable does not mean unspeakable! 
You can spell out many unprovable things.

Why care for the subset of provable sentences? 
Aren't we interested in the full set of all describable sentences? 
We can generate it, without caring for proofs at all.

"Unspeakable" means "beyond formal definition." In particular, the
way we humans communicate and agree on a common formal language is not
formally defined. You write "x" and I say, ok, it's an "x", but that's
all based on some sort of informal agreement that involves complex
pattern recognition and learning processes. Usually we do not question
the validity of outcomes of such cognitive processes, and just go ahead
with formal derivations.  But there is this major informal step *before*
formal reasoning can start, and good textbooks on logic acknowledge
this.

Re: Predictions & duplications

2001-10-26 Thread Juergen Schmidhuber


> From: Juho Pennanen <[EMAIL PROTECTED]>
> So there may be no 'uniform probability distribution' on the set of all
> strings, but there is the natural probability measure, that is in many
> cases exactly as useful.

Sure, I agree, measures are useful; I'm using them all the time. But in
general they are _not_ the same thing as probability distributions.

> From: Russell Standish <[EMAIL PROTECTED]>
> The only reason for not accepting the simplest thing is if it can be
> shown to be logically inconsistent. This far, you have shown no such
> thing, but rather demonstrated an enormous confusion between measure
> and probability distribution.

Russell, this is really too much; please read any intro to measure
theory,
e.g., the one by Halmos.  Measures in general are _not_ probability
distributions.  They are more than that; you can view them as _sets_ of
probability distributions on sets that are related to each other in a
specific way. Probability density functions define measures and
therefore
in general aren't probability distributions either. The uniform PDF on
[0..1] yields the perfect trivial example; that's exactly why I used it
before.  In the computability context, compare LI/Vitanyi 1997, chapters
4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
(continuous sample space with (semi)measure M(x)).

> 1) Halting theorem implies the set of halting programs is of measure
> zero within the set of all programs

You mean, given a uniform measure? But why should you need the halting
theorem for this trivial statement?  In fact, what exactly do you mean
by
halting theorem?  (Schoenfield's book provides a good intro to
mathematical
logic and computability theory.)

> The GP program (aka Universal Dovetailer) is not the simplest
> thing. The simplest describable thing is the set of all descriptions,
> that simply exist without being generated. That has precisely zero
> information or complexity, whereas the UD has some complexity of the
> order of a few 100 bits. The simplest prior is the uniform one, ie
> there is no bias whatsoever in the selection.

Exist without being generated?  Precisely zero information or
complexity? But with respect to what?  Any traditional definition of
information/simplicity requires the specification of a formal
description
method, such as a Turing machine. You don't need one? Then how do you
define information or complexity? How exactly do you define "simple" ?

Re: Predictions & duplications

2001-10-26 Thread Marchal

Juergen Schmidhuber wrote

> Russell Standish wrote:
>> I never subscribed to computationalism at any time,
>> but at this stage do not reject it. I could conceive of us living in
>> a stupendous virtual reality system, which is in effect what your GP
>> religion Mark II is. However, as pointed out by others, it does suffer
>> from "turtle-itis", and should not be considered the null
>> hypothesis. It requires evidence for belief.
>
>By turtle-itis you mean: in which universe do the GP and his computer
>reside?  Or the higher-level GP2 which programmed GP? And so on? But
>we cannot get rid of this type of circularity - computability and
>mathematical logic are simply taken as given things, without any
>further explanation, like a religion. 

I disagree. Comp could not have any pretension with respect to the search
of a TOE without Church thesis. And we can argue for Church Thesis.
(The two main and very different types of argument are:
   a) The empirical fact that all definitions of computable functions 
leads
  provably to the same class of functions.
   b) The closure of that set for the "most transcendant" operation in
  mathematics: diagonalisation.

It is really the closure of the set of computable functions for the 
diagonalisation which save comp from any form of *turtle-itis*.
That closure provides the fixed point for all computable and
self-referencial transformations (from which in particular G and G*
an G* minus G can arise).

>The computable multiverse, or
>the set of logically possible or mathematically possible or computable
>universes, represents the simplest explanation we can write down formally.

Except that I still don't know what you mean by "universe".

>But what exactly does it mean to accept something as a formal statement?
>What does it mean to identify the messy retina patterns caused by this
>text with abstract mathematical symbols such as x and y?  All formal
>explanations in our formal papers assume that we agree on how to read
>them. 

I disagree. Axiomatic has been invented and developped for
being able to see the validity of our reasoning independently of our
interpretations of it. Of course we bet on the ability to distinguish 
things, and O/1 in particular, but that's another level.

>But reading and understanding papers is a complex physical and
>cognitive process. So all our formal explanations are relative to this
>given process which we usually do not even question. 

You are mixing levels of description. It is not because I take for
granted that you can read english, that I will take for granted that
you have a brain made of wave/particle or *any* physical description.

This confirm your *use* of the GP suffers turtle-itis. I should have
ask you what you mean by universe much earlier. My feeling is you are
presupposing some conception of the universe. The UDA gives at least
a road making the reasoning immune with respect to such preconception
(with the exception of the preconception of natural numbers).

>Essentially, the
>GP program is the simplest thing we can write down, relative to the
>unspoken assumption that it is clear what it means to write something
>down, and how to process it. 

The miracle of Church Thesis is that the "how to process it" is
defined in the arithmetical relation themselves. That is what make
possible to study the possible discourse of the machines themselves,
about they most probable relative computation states and stories.
I guess we agree on the importance of intuition about the finitary,
the finite things.

>It's the simplest thing, given this use
>of mathematical language we have agreed upon. But here the power of the
>formal approach ends - unspeakable things remain unspoken.

I disagree. I would even say that it is here that the serious formal
approach begins. Take "unprovable" for "unspeakable", we can 
meta-reason (informally or formally) and study the logical structure
of the set of unprovable sentences by sound universal machines.
Although unprovable by the UTM, those sentences are anticipable.
Note that "there is a universe around me" is probably such a 
(psycho) logical anticipable sentence, not a provable one.

Bruno

Re: Predictions & duplications

2001-10-25 Thread Russell Standish

[EMAIL PROTECTED] wrote:
> 
> > From: Russell Standish <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > 
> > I think we got into this mess debating whether an infinite set could
> > support a uniform measure. I believe I have demonstrated this. 
> > I've yet to see anything that disabuses me of the notion that a
> > probability distribtuion is simply a measure that has been normalised
> > to 1. Not all measures are even normalisable.
> 
> Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
> uniform probability distribution. Why were measures invented in the first
> place? To deal with infinite sets.  You cannot have a uniform probability
> distribution on infinitely many things. That's why measures isolate just
> finitely many things, say, every bitstring of size n, and for each x of
> size n look at the infinite set of strings starting with x. A uniform
> measure assigns equal probability to each such set. Of course, then you
> have a uniform probability distribution on those finitely many things
> which are sets. But that's not a uniform probability distribution on
> infinitely many things, e.g., on the bitstrings themselves!  The measure
> above is _not_ a probability distribution; it is an infinite _set_ of
> _finite_ probability distributions, one for string size 0, one for string
> size 1, one for string size 2,...
> 

Good grief, you've missed the point entirely! Measures do measure
infinite sets - consider the set [0,1] - it is an infinite set, and
the uniform probaility distribution and uniform measure are the same
thing. We can also consider the set [0,\infty) which has a uniform
measure on it (the same constant probability distibution as before,
but extended to the full set). Clearly it cannot have a uniform
probability distribution, as the set has infinite measure. You seem
fixated on the peculiar definition of measure on strings that you gave
before, rather than the more general notion.

> > I realise that the Halting theorem gives problems for believers of
> > computationalism. 
> 
> It does not. Why should it?

Restating what I said before simply:

1) Halting theorem implies the set of halting programs is of measure
zero within the set of all programs

2) Computationalism is the assumption that the universe's observer is
a universal Turing machine.

3) Uniform measure over the Plenitude of descriptions implies that
observer will never see simple strings.

> 
> > I never subscribed to computationalism at any time,
> > but at this stage do not reject it. I could conceive of us living in
> > a stupendous virtual reality system, which is in effect what your GP
> > religion Mark II is. However, as pointed out by others, it does suffer
> > from "turtle-itis", and should not be considered the null
> > hypothesis. It requires evidence for belief.
> 
> By turtle-itis you mean: in which universe do the GP and his computer
> reside?  Or the higher-level GP2 which programmed GP? And so on? But
> we cannot get rid of this type of circularity - computability and
> mathematical logic are simply taken as given things, without any
> further explanation, like a religion. The computable multiverse, or
> the set of logically possible or mathematically possible or computable
> universes, represents the simplest explanation we can write down formally.
> But what exactly does it mean to accept something as a formal statement?
> What does it mean to identify the messy retina patterns caused by this
> text with abstract mathematical symbols such as x and y?  All formal
> explanations in our formal papers assume that we agree on how to read
> them. But reading and understanding papers is a complex physical and
> cognitive process. So all our formal explanations are relative to this
> given process which we usually do not even question. Essentially, the
> GP program is the simplest thing we can write down, relative to the
> unspoken assumption that it is clear what it means to write something
> down, and how to process it. It's the simplest thing, given this use
> of mathematical language we have agreed upon. But here the power of the
> formal approach ends - unspeakable things remain unspoken.
> 

The GP program (aka Universal Dovetailer) is not the simplest
thing. The simplest describable thing is the set of all descriptions,
that simply exist without being generated. That has precisely zero
information or complexity, whereas the UD has some complexity of the
order of a few 100 bits. The simplest prior is the uniform one, ie
there is no bias whatsoever in the selection.

The only reason for not accepting the simplest thing is if it can be
shown to be logically inconsistent. This far, you have shown no such
thing, but rather demonstrated an enormous confusion between measure
and probability distribution.

The set of all descriptions and the uniform measure do not suffer from
turtle-itis. However, they are susceptible to criticism of the
Church-Turing thesis - namely why should our idea of i

Re: Predictions & duplications

2001-10-25 Thread Juho Pennanen

juergen wrote:

> Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
> uniform probability distribution. Why were measures invented in the first
> place? To deal with infinite sets.  You cannot have a uniform probability
> distribution on infinitely many things. 

The last sentence is trivially true when 'probability distributions' are 
defined as probability measures on discrete sets. But someone could also 
think this is just irrelevant word-play on definitions.

There are uniform probability (density) functions on infinite sets, e.g. 
the uniform distribution on the interval [0,1], which gives the measure 
(probability) t for each interval [x,x+t]. Obviously, this distribution 
gives measure 0 for any singleton containing only one real number. Still 
it's uniform, though not a 'probability distribution' if one wishes to 
use the most restricted definition.

Similarly, there is the natural probability measure on the set of all 
infinite strings of 0's and 1's. It is 'uniform' in the sense that no 
string or place in a string is in a priviledged position. To define it, 
set n_0 as the set of all strings that have a 0 at the n'th place and 
n_1 as the set of all strings that have a 1 at the n'th place, for any 
n. Then set m(n_0)=m(n_1)=1/2, for all n. Using the standard definition 
of measure that has been cited on this list (Kolmogorov axioms), m has a 
unique extension which is a measure on the set of all strings.

So there may be no 'uniform probability distribution' on the set of all 
strings, but there is the natural probability measure, that is in many 
cases exactly as useful.

Juho

Re: Predictions & duplications

2001-10-25 Thread juergen


> From: Russell Standish <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> 
> I think we got into this mess debating whether an infinite set could
> support a uniform measure. I believe I have demonstrated this. 
> I've yet to see anything that disabuses me of the notion that a
> probability distribtuion is simply a measure that has been normalised
> to 1. Not all measures are even normalisable.

Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
uniform probability distribution. Why were measures invented in the first
place? To deal with infinite sets.  You cannot have a uniform probability
distribution on infinitely many things. That's why measures isolate just
finitely many things, say, every bitstring of size n, and for each x of
size n look at the infinite set of strings starting with x. A uniform
measure assigns equal probability to each such set. Of course, then you
have a uniform probability distribution on those finitely many things
which are sets. But that's not a uniform probability distribution on
infinitely many things, e.g., on the bitstrings themselves!  The measure
above is _not_ a probability distribution; it is an infinite _set_ of
_finite_ probability distributions, one for string size 0, one for string
size 1, one for string size 2,...

> I realise that the Halting theorem gives problems for believers of
> computationalism. 

It does not. Why should it?

> I never subscribed to computationalism at any time,
> but at this stage do not reject it. I could conceive of us living in
> a stupendous virtual reality system, which is in effect what your GP
> religion Mark II is. However, as pointed out by others, it does suffer
> from "turtle-itis", and should not be considered the null
> hypothesis. It requires evidence for belief.

By turtle-itis you mean: in which universe do the GP and his computer
reside?  Or the higher-level GP2 which programmed GP? And so on? But
we cannot get rid of this type of circularity - computability and
mathematical logic are simply taken as given things, without any
further explanation, like a religion. The computable multiverse, or
the set of logically possible or mathematically possible or computable
universes, represents the simplest explanation we can write down formally.
But what exactly does it mean to accept something as a formal statement?
What does it mean to identify the messy retina patterns caused by this
text with abstract mathematical symbols such as x and y?  All formal
explanations in our formal papers assume that we agree on how to read
them. But reading and understanding papers is a complex physical and
cognitive process. So all our formal explanations are relative to this
given process which we usually do not even question. Essentially, the
GP program is the simplest thing we can write down, relative to the
unspoken assumption that it is clear what it means to write something
down, and how to process it. It's the simplest thing, given this use
of mathematical language we have agreed upon. But here the power of the
formal approach ends - unspeakable things remain unspoken.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/

Re: Predictions & duplications

2001-10-23 Thread juergen

> From: [EMAIL PROTECTED]:
> [EMAIL PROTECTED] wrote:
> > > From [EMAIL PROTECTED]:
> > > [EMAIL PROTECTED] wrote:
> > > > M measure:
> > > > M(empty string)=1
> > > > M(x) = M(x0)+M(x1) nonnegative for all finite x.
> > > 
> > > This sounds more like a probability distribution than a measure. In
> > > the set of all descriptions, we only consider infinite length
> > > bitstrings. Finite length bitstrings are not members. However, we can
> > > identify finite length bitstrings with subsets of descriptions. The
> > > empty string corresponds to the full set of all descriptions, so the
> > > first line M(empty string)=1 implies that the measure is normalisable
> > > (ie a probability distribution).
> > 
> > Please check out definitions of measure and distribution! 
> > Normalisability is not the critical issue.
> > 
> > Clearly: Sum_x M(x) is infinite. So M is not a probability
> > distribution. M(x) is just measure of all strings starting with x:
> > M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 
> 
> For a measure to be normalisable, the sum over a *disjoint* set of
> subsets must be finite. If the set of subsets is not disjoint (ie the
> intersection are not empty) then the sum may well be infinite.
> Bringing this to the case of finite strings. Each finite string is
> actually a subset of the set of infinite strings, each containing the
> same finite prefix. So the string 101010 is actually a subset of 10101
> and so on. The Sum_x M(x), where I assume x ranges over all strings
> will of course be infinite. However, since the set of finite strings
> is not disjoint, this doesn't imply M(x) is unnormalisable.
> Now when you realise that every finite string x is a subset of the empty
> string, it becomes clear that M(x) is normalised to precisely 1.

The point is: prob dists and measures are different things. There is a
good reason for giving them different names. Prob dists assign numbers
to individual objects, not to sets.  Traditional definitions as well as
those for semimeasures in http://www.idsia.ch/~juergen/toesv2/

> > Neglecting finite universes means loss of generality though.
> > Hence measures mu(x) in the ATOE paper do not neglect finite x: 
> > 
> > mu(empty string)=1
> > mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).
> > 
> > And here P is a probability distribution indeed! 
> > P(x)>0 possible only for x with finite description. 
> > 
> 
> The P(x)>0 case above actually breaks the countably subadditive
> property, so mu(x) cannot be called a measure... I'm not entirely sure
> what you're getting at here.

The set of computable universes includes the finite ones which we may 
not ignore. How do they contribute to the measure of all universes?
For convenience introduce a third symbol $ besides 0 and 1. Now what is 
the measure of the set of all universes starting with 00110? It's the sum of
the measure of the set of universes starting with 001101 and
the measure of the set of universes starting with 001100 and
the measure of the single universe 00110$ that stops right after 00110. 
Once there is a $ symbol there won't be another 0 or 1.  
So mu(x) = P(x)+mu(x0)+mu(x1).  

Our favorite example is the universal Turing machine with random inputs.
For finite x: P(x) is the probability that the TM output is x and nothing
but x. mu(x) is something different, namely, the so-called universal measure 
of all strings _starting_ with x.  Again mu(x) = P(x)+mu(x0)+mu(x1).  

Juergen Schmidhuberhttp://www.idsia.ch/~juergen/

Re: Predictions & duplications

2001-10-22 Thread Russell Standish

[EMAIL PROTECTED] wrote:
> 
> 
> 
> > From [EMAIL PROTECTED]:
> > [EMAIL PROTECTED] wrote:
> > > M measure:
> > > M(empty string)=1
> > > M(x) = M(x0)+M(x1) nonnegative for all finite x.
> > 
> > This sounds more like a probability distribution than a measure. In
> > the set of all descriptions, we only consider infinite length
> > bitstrings. Finite length bitstrings are not members. However, we can
> > identify finite length bitstrings with subsets of descriptions. The
> > empty string corresponds to the full set of all descriptions, so the
> > first line M(empty string)=1 implies that the measure is normalisable
> > (ie a probability distribution).
> 
> 
> Please check out definitions of measure and distribution! 
> Normalisability is not the critical issue.
> 
> Clearly: Sum_x M(x) is infinite. So M is not a probability
> distribution. M(x) is just measure of all strings starting with x:
> M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 

For a measure to be normalisable, the sum over a *disjoint* set of
subsets must be finite. If the set of subsets is not disjoint (ie the
intersection are not empty) then the sum may well be infinite.

Bringing this to the case of finite strings. Each finite string is
actually a subset of the set of infinite strings, each containing the
same finite prefix. So the string 101010 is actually a subset of 10101
and so on. The Sum_x M(x), where I assume x ranges over all strings
will of course be infinite. However, since the set of finite strings
is not disjoint, this doesn't imply M(x) is unnormalisable.

Now when you realise that every finite string x is a subset of the empty
string, it becomes clear that M(x) is normalised to precisely 1.

> 
> Neglecting finite universes means loss of generality though.
> Hence measures mu(x) in the ATOE paper do not neglect finite x: 
> 
> mu(empty string)=1
> mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).
> 
> And here P is a probability distribution indeed! 
> P(x)>0 possible only for x with finite description. 
> 

The P(x)>0 case above actually breaks the countably subadditive
property, so mu(x) cannot be called a measure... I'm not entirely sure
what you're getting at here.

Cheers

> Juergen Schmidhuber
> 
> http://www.idsia.ch/~juergen/
> http://www.idsia.ch/~juergen/everything/html.html
> http://www.idsia.ch/~juergen/toesv2/
> 
> 
> 

Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

Re: Predictions & duplications

2001-10-22 Thread juergen




> From [EMAIL PROTECTED]:
> [EMAIL PROTECTED] wrote:
> > M measure:
> > M(empty string)=1
> > M(x) = M(x0)+M(x1) nonnegative for all finite x.
> 
> This sounds more like a probability distribution than a measure. In
> the set of all descriptions, we only consider infinite length
> bitstrings. Finite length bitstrings are not members. However, we can
> identify finite length bitstrings with subsets of descriptions. The
> empty string corresponds to the full set of all descriptions, so the
> first line M(empty string)=1 implies that the measure is normalisable
> (ie a probability distribution).


Please check out definitions of measure and distribution! 
Normalisability is not the critical issue.

Clearly: Sum_x M(x) is infinite. So M is not a probability
distribution. M(x) is just measure of all strings starting with x:
M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 

Neglecting finite universes means loss of generality though.
Hence measures mu(x) in the ATOE paper do not neglect finite x: 

mu(empty string)=1
mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).

And here P is a probability distribution indeed! 
P(x)>0 possible only for x with finite description. 

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/

Re: Predictions & duplications

2001-10-22 Thread Alastair Malcolm


Juergen wrote (on 12th Oct):

> . . . In most possible futures your computer will
> vanish within the next second. But it does not. This indicates that our
> future is _not_ sampled from a uniform prior.

I don't wish to comment directly on the computer-vanishing problem as it
applies to Juergen's scheme (my own problem with this laudable scheme is
that it appears to be vulnerable to the same 'turtle-itis' criticism as for
all theistic religions - the (literal or abstract GP) 'turtle' responsible
for the world seems to need another turtle to support it and so on - there
is no full explanation), but I would like to say that certain other proposed
solutions don't suffer from this computer-vanishing problem (also known as
the WR/dragon problem), if one thinks of infinite length bit strings /
formal system descriptions via Limit n -> infinity, where n is the relevant
string/description length (see appendix below). It seems to me that in
thinking in simple infinity terms one can lose essential information (for
example integers cannot be determined to be more numerous than the odd
numbers - both are of the lowest order of infinity - not a problem for limit
analysis).

Alastair Malcolm

APPENDIX

One might naively think that as there are at least hundreds of possible
states (call them V1, V2... ) where some part of our computer clearly
vanishes (and only one (N), where normality prevails), then even if one
considers bit string or other types of formal description involving other
variations in our universe or indeed other universes, one could still
'divide through' to find that we are most likely to be in a universe where
our computer vanishes in whole or part.

However, we note that in any minimal description of our universe (and the
following argument does not depend on there having to *only* be a minimal
description), deviations from the actual physical laws causing V1, V2...
will involve additional ad-hoc rules/events, so we can say that in
complexity terms (whether as a bit string or formal system description) we
will have V1 = N + VE1, V2 = N + VE2 etc, where VE1, VE2 etc are the extra
segments of description required to cater for the part-vanishings (strictly:
Length(V1) = Length(N) + Length(VE1) etc, but hopefully this is clear).

Moreover, if we are considering all possible descriptions, we also have to
allow for extra descriptions corresponding to entities beyond our
observability. For each V = N + VE we will have many DC = N + DCE, where DC
is a 'don't care' description - it is a universe (or set of universes)
indistinguishable by us from N (our own, non-computer-vanishing one), yet
objectively different.

Now, the key point is that in any objective descriptive framework (whether
by bit string or formal system), one should take the Limit as the
description length increases to infinity, not as our (humanly biassed)
visible universe is progressively and unobservably added to (say by other
universes). As we do this, we are far more likely to be in a DC (= N + DCE)
universe than a V (= N + VE) universe: computers don't normally vanish, in
whole or in part.

More details at:
http://www.physica.freeserve.co.uk/p105.htm
linking to:
http://www.physica.freeserve.co.uk/p111.htm
http://www.physica.freeserve.co.uk/p112.htm

Re: Predictions & duplications

2001-10-16 Thread Russell Standish


[EMAIL PROTECTED] wrote:
> 
> 
> Confusion about what's a measure? 
> What's a distribution?
> Simple but important! 
> For bitstrings x: 
> 
> M measure:
> M(empty string)=1
> M(x) = M(x0)+M(x1) nonnegative for all finite x.

This sounds more like a probability distribution than a measure. In
the set of all descriptions, we only consider infinite length
bitstrings. Finite length bitstrings are not members. However, we can
identify finite length bitstrings with subsets of descriptions. The
empty string corresponds to the full set of all descriptions, so the
first line M(empty string)=1 implies that the measure is normalisable
(ie a probability distribution).

As I said before, non-normalisable measures are considered measures by
the mathematical community - and the uniform measure over the set of
all descriptions is well defined.

> 
> P probability distribution:
> Sum_x P(x) = 1; P(x) nonnegative
> 
> ---
> 
> M semimeasure - replace "=" by ">=":
> M(x) >= M(x0)+M(x1) 
> 
> P semidistribution - replace:
> Sum_x P(x) <= 1
> 
> ---
> 
> Examples:
> 
> 1. Distribution: E.g., integers n: P(n) = 6/(Pi n^2)
> 2. Semidistribution: m(x) = probability of guessing a halting program 
>for x (BTW, Hal, this was first published by Levin in 1974, not by 
>Chaitin in 1975)
> 3. Measure: E.g., each x of size n gets weight 2^-n 
> 4. Semimeasure: E.g., mu^M(x) = probability of guessing a halting or 
>nonhalting monotone TM program whose output starts with x 
> 
> Check out: Measures and Probability Distributions
> Section 4 of "Algorithmic TOEs"
> http://www.idsia.ch/~juergen/toesv2/node15.html
> 
> 
> 
> Juergen Schmidhuber
> 
> http://www.idsia.ch/~juergen/
> http://www.idsia.ch/~juergen/everything/html.html
> http://www.idsia.ch/~juergen/toesv2/
> 
> 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

Re: Predictions & duplications

2001-10-16 Thread juergen



Confusion about what's a measure? 
What's a distribution?
Simple but important! 
For bitstrings x: 

M measure:
M(empty string)=1
M(x) = M(x0)+M(x1) nonnegative for all finite x.

P probability distribution:
Sum_x P(x) = 1; P(x) nonnegative

---

M semimeasure - replace "=" by ">=":
M(x) >= M(x0)+M(x1) 

P semidistribution - replace:
Sum_x P(x) <= 1

---

Examples:

1. Distribution: E.g., integers n: P(n) = 6/(Pi n^2)
2. Semidistribution: m(x) = probability of guessing a halting program 
   for x (BTW, Hal, this was first published by Levin in 1974, not by 
   Chaitin in 1975)
3. Measure: E.g., each x of size n gets weight 2^-n 
4. Semimeasure: E.g., mu^M(x) = probability of guessing a halting or 
   nonhalting monotone TM program whose output starts with x 

Check out: Measures and Probability Distributions
Section 4 of "Algorithmic TOEs"
http://www.idsia.ch/~juergen/toesv2/node15.html



Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/

Re: Predictions & duplications

2001-10-16 Thread Marchal

Juergen Schmidhuber wrote:

>Bruno, there are so many misleading or unclear statements
>in your post - I do not even know where to start. 

Rhetorical tricks, if not insult as usual :(
Have you read  http://www.escribe.com/science/theory/m3241.html

Well, G* tells me to remain mute here, but then I don't care.

>[...]

>You can be a GP. Write a program that computes all universes
>computable in the limit. Add storage in case of storage overflow.
>Still, you as a GP have a resource problem. At some point it will
>get harder and harder for you to keep computing all the things
>you'd like to compute.

Are you using the GP as an analogy or what? What is exactly
your ontology? You have not answered my question: How could the
GP have resource problems? Is not resource a notion internal to
some universe, and are not universe generated by the GP.
You are presupposing physics.

>Just like I cannot say much about the environment of the particular 
>GP who programmed us.

Are you really serious? Are you believing we have a GP in a universe?
Where does the universe of our GP comes from?
This is tortoise build on tortoises. Even infinite tortoises
sequences embedded in a physical universe. That explains nothing.

I knew you have not yet understand
the comp reversal physico/psycho, but you have not even seen
the reversal physico/computer-science, or what?

>Again: forget all the esoteric undecidability stuff, which does not
>really matter here, and be pragmatic and try to look at things from the
>perspective of some observer being computed as part of a virtual reality
>that you programmed. The observer may get smart and correctly suspect
>you have a resource problem, and can make nontrivial predictions from
>there, using the Speed Prior.

You are still postulating what I am searching an explanation for.
I am not (necessarily) pragmatic in advance. A concrete virtual reality
on earth inherits the "normality" of the world in which
it is embedded. I pretend that comp force us to justify that
normality. It is not obvious that that is possible.
But the UDA forces to take all computations into account, and 
undecidability stuff is unavoidable, if only because we can only *bet*
on our consistent extentions (In the translation of the UDA into
arithmetic, consistent extension have the unprovable type "<>p").

Incompleteness phenomena put fundamental constraints for a TOE.
You could call them logical and arithmetical priors.

But you don't seem searching a toe, just a recipe for
predicting some granted neighborhood.

The arithmetical UD is just the set of \Sigma 1 sentences.
Resource must be explained from it, not the inverse.

I am not sure a TOE can be pragmatic. Like
quantum mechanics is not useful for cooking.

>No, that's exactly where the Speed Prior comes in. Accoring to this prior, 
>the slow programs computing the same thing contribute less to its 
>probability mass. It is explained in the chapter on
>Speed Prior-Based Inductive Inference 
>http://www.idsia.ch/~juergen/toesv2/node32.html

I don't need to read it because it is in contradiction with the
"invariance lemma", in particular by the fact that from a first
person point of view the delays (between the many acces by the UD 
of near computational states) does not matter.
(BTW I have read it, and we discussed it before). 

You know the GP will generate itself, but will generate also less and 
less efficient version of itself. From the first person point of view 
they 
are indistinguishable. Are you telling me that the Juergen and Bruno
appearing in those very slow GPs are zombie?

>> It is indeed hard to write a little program which generate a long
>> Kolmogorov-Chaitin incompressible 01 string.
>> But, as I told you earlier, it is quite easy to write a little
>> program which generates them all. (It is the essence of the
>> everything idea imo).
>
>As you told me earlier? Huh? 
>That's one of the points of the 1997 book chapter.

I was meaning "As I recall you earlier in some recent post"!
(but look at my 1988 Toulouse paper ;-)

>The thing is: when you generate them all, and assume that all are
>equally likely, in the sense that all beginnings of all strings are
>uniformly distributed, then you cannot explain why the regular universes 
>keep being regular.  The random futures are then just as likely as the 
>nonrandom ones.
>
>So you NEED something additional to explain the ongoing regularity.
>You need something like the Speed Prior, which greatly favors regular 
>futures over others.

You speak like if I were proposing the iterated duplication as 
a universal explanation. It is only an illustration of the comp 
indeterminism. With comp it is the whole work of the UD, UD*, which
constitutes the domain of indeterminism. The self-multiplication 
experience
just show how a 3 deterministic process consistent with comp explains
a phenomenology of (in the limit) sort of absolute randomnes.
The "real" experience is the infinite running of the UD. And that

Re: Predictions & duplications

2001-10-15 Thread Russell Standish


Hal - that is not a uniform measure!

[EMAIL PROTECTED] wrote:
> 
> Juergen Schmidhuber writes:
> > But there is no uniform prior over all programs!
> > Just like there is no uniform prior over the integers.
> > To see this, just try to write one down.
> 
> I think there is.  Given a program of length l, the prior probability
> is 2^(-l).  (That is 2 to the power of negative l.)  The length of a
> program is defined by interpreting it using self-delimiting rules as
> is customary in the AIT analysis of Greg Chaitin.
> 
> Hal Finney
> 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

Re: Predictions & duplications

2001-10-15 Thread Saibal Mitra


Hal Finney wrote:

> Juergen Schmidhuber writes:
> > But there is no uniform prior over all programs!
> > Just like there is no uniform prior over the integers.
> > To see this, just try to write one down.
>
> I think there is.  Given a program of length l, the prior probability
> is 2^(-l).  (That is 2 to the power of negative l.)  The length of a
> program is defined by interpreting it using self-delimiting rules as
> is customary in the AIT analysis of Greg Chaitin.

This doesn't seem to be very uniform to me. Maybe you mean that with the
above prior the probability for a bit, drawn randomly from the set of all
programs,  to be 1 is 1/2 ?

Saibal

Saibal

Re: Predictions & duplications

2001-10-15 Thread hal

Saibal wrote:
> Hal Finney wrote:
> > Juergen Schmidhuber writes:
> > > But there is no uniform prior over all programs!
> > > Just like there is no uniform prior over the integers.
> > > To see this, just try to write one down.
> >
> > I think there is.  Given a program of length l, the prior probability
> > is 2^(-l).  (That is 2 to the power of negative l.)  The length of a
> > program is defined by interpreting it using self-delimiting rules as
> > is customary in the AIT analysis of Greg Chaitin.
>
> This doesn't seem to be very uniform to me. Maybe you mean that with the
> above prior the probability for a bit, drawn randomly from the set of all
> programs,  to be 1 is 1/2 ?

I should have used the proper name for this, the universal prior.

However it is derived in the limit (of string length -> infinity)
by giving the same probability to all strings, and then adopting the
convention that we ignore the material past the end of a self-delimiting
program.

As we go to the limit, if we are looking at n bit strings, then a self
delimiting program of length l will have 2^(n-l) instances among the
2^n possible strings, hence will have measure 2^(-l), which is the
universal prior.  So the universal prior on self-delimiting programs
is the same as the uniform prior on all program strings, as we take the
limit in program string length going to infinity.

Hal

Re: Predictions & duplications

2001-10-15 Thread hal

Juergen Schmidhuber writes:
> But there is no uniform prior over all programs!
> Just like there is no uniform prior over the integers.
> To see this, just try to write one down.

I think there is.  Given a program of length l, the prior probability
is 2^(-l).  (That is 2 to the power of negative l.)  The length of a
program is defined by interpreting it using self-delimiting rules as
is customary in the AIT analysis of Greg Chaitin.

Hal Finney

Re: Predictions & duplications

2001-10-15 Thread Marchal

Hal Finney wrote:

>Isn't this fixed by saying that the uniform measure is not over all
>universe histories, as you have it above, but over all programs that
>generate universes?  Now we have the advantage that short programs
>generate more regular universes than long ones, and the WAP grows teeth.

I agree with Juergen on this: there is no uniform measure over all
programs (nor on all integers).

But in anycase you know I argue, unlike Juergen, that the measure should 
be put on all (relative) consistent computational extensions
appearing in UD*. This gives indeed a sort of Weak Turing-tropic 
Principle.

In which relative *speed* is a by product, not a prior.

Bruno

Re: Predictions & duplications

2001-10-15 Thread Marchal

Saibal Mitra wrote:

>John Mikes wrote:
>`` If you say: a sequence defying all
>rules, then it is not random, it is calculable. You have to consider all
>rules and cut them out.´´
>
>If you try to do that then you encounter the famous halting problem.

Exactly. So why not defined random by incompressible by any (universal)
machine (Kolmogorov incompressibility). In similar sense you can prove that
"most" finite and infinite sequence are incompressible.
Here Church Thesis gives some "absolute" epistemological meaning of that
notion of randomnes.

Cf Li and Vitanyi

Bruno

Re: Predictions & duplications

2001-10-15 Thread Juho Pennanen


Juergen writes
> But there is no uniform prior over all programs!
> Just like there is no uniform prior over the integers.
> To see this, just try to write one down.

This is of course true (if uniform measure is a 
measure that gives the same, non-zero, probability
for each program. I got no idea what's the official 
definition). 

OTOH, There's of course the natural product measure over 
the set of all infinite strings. In my last post I
tried to show that this is enough. I admit now I was
cutting much too many corners. It is not even obvious
that the set of all programs that produce a specific
conscious experience is measurable. And even if it
was, it now seems obvious to me that random 
programs produce random worlds.

So some kind of 'resource limitation' must exist. 
One option would be to accept only programs that
are shorter than some maximum length. This corresponds
to only accepting worlds that have complexity lower
than some maximum. But this is one type of
'resource limitation' and in fact a very unelegant
one.

Juho

Re: Predictions & duplications

2001-10-15 Thread juergen




But there is no uniform prior over all programs!
Just like there is no uniform prior over the integers.
To see this, just try to write one down.

BTW, it's not Solomon-Levy but Solomonoff-Levin. And
it has nothing to do with resource bounds!

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/



> From [EMAIL PROTECTED]  Fri Oct 12 19:22:02 2001
>
> Juergen writes:
> > Some seem to think that the weak anthropic principle explains the
> > regularity. The argument goes like this: "Let there be a uniform measure
> > on all universe histories, represented as bitstrings.  Now take the tiny
> > subset of histories in which you appear.  Although the measure of this
> > subset is tiny, its conditional measure, given your very existence,
> > is not: According to the weak anthropic principle, the conditional
> > probability of finding yourself in a regular universe compatible with 
> > your existence equals 1."
> >
> > But it is essential to see that the weak anthropic principle does not
> > have any predictive power at all. It does not tell you anything about
> > the future.  It cannot explain away futures in which you still exist
> > but irregular things happen. Only a nonuniform prior can explain this.
> 
> Isn't this fixed by saying that the uniform measure is not over all
> universe histories, as you have it above, but over all programs that
> generate universes?  Now we have the advantage that short programs
> generate more regular universes than long ones, and the WAP grows teeth.


> From: Russell Standish <[EMAIL PROTECTED]>
>
> That is almost the correct solution, Hal. If we ask what an observer
> will make of a random description chosen at random, then you get
> regular universes with probability exponentially related to the
> inferred complexity. It is far clearer to see what happen when the
> observer is a UTM, forcibly terminating programs after a
> certain number of steps (representing the observer's resource bound)
> (thus all descriptions are halting programs). Then one obtains a
> Solomon-Levy distribution or universal prior. However, this argument
> also works when the observer is not a UTM, but simply a classification
> device of some kind.

Re: Predictions & duplications

2001-10-14 Thread Russell Standish


"According to whim or taste" implies a conscious entity performing
choices according to a free will. This need not be the case. In my
mind, random means selected without cause (or without
procedure/algorithm).

A lot has been written on randomness, and its problematic nature. I
don't for one minute suggest I have anything new to say on the
matter. 

Cheers

jamikes wrote:
> 
> Dear Russell and Hal,
> this is a naive question, however it goes into the basics of your written
> explanations. How would YOU define "random"? I had this problem for a long
> long time and never got a satisfactory solution. In my (non Indo-European)
> language, Hungarian, there is no exact word for it, it is used as a term
> meaning "according to whim or taste" (tetszöleges) - which leaves open that
> I may not LIKE the 'random' in question. If you say: a sequence defying all
> rules, then it is not random, it is calculable. You have to consider all
> rules and cut them out. Is a series of 1... random? of course it may be.
> Is the pi-decimal random? no, because it is a result of calculation.  If one
> says: "just pick it" that would follow all circumstantial pressure of the
> situation, maybe unconsciously, yet defying the 'random'.
> 
> So what is the content you use behind that word?
> 
> I would be surprised if both of you (and others, too) would agree .
> Till then I wish you luck to use the word - at random.
> 
> Best wishes
> John Mikes
> - Original Message -
> From: "Russell Standish" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Sunday, October 14, 2001 4:36 AM
> Subject: Re: Predictions & duplications
> >>>>>>>SNIP<<<<<<<<<
> 
> 
> > That is almost the correct solution, Hal. If we ask what an observer
> > will make of a random description chosen at random, then you get
> > regular universes with probability exponentially related to the
> > inferred complexity. It is far clearer to see what happen when the
> > observer is a UTM, forcibly terminating programs after a
> > certain number of steps (representing the observer's resource bound)
> > (thus all descriptions are halting programs). Then one obtains a
> > Solomon-Levy distribution or universal prior. However, this argument
> > also works when the observer is not a UTM, but simply a classification
> > device of some kind.
> >
> > The WAP has nothing to do with this issue, except inasmuch as
> > universes can only be observed through the eyes of some observer.
> >
> > Again I reiterate that Juergen's resource-bounded "Great Programmer"
> > religion need be nothing but a reflection of our conscious selves stamped
> > upon our observations.
> >
> > Cheers
> >
> > [EMAIL PROTECTED] wrote:
> > >
> > > Juergen writes:
> > > > Some seem to think that the weak anthropic principle explains the
> > > > regularity. The argument goes like this: "Let there be a uniform
> measure
> > > > on all universe histories, represented as bitstrings.  Now take the
> tiny
> > > > subset of histories in which you appear.  Although the measure of this
> > > > subset is tiny, its conditional measure, given your very existence,
> > > > is not: According to the weak anthropic principle, the conditional
> > > > probability of finding yourself in a regular universe compatible with
> > > > your existence equals 1."
> > > >
> > > > But it is essential to see that the weak anthropic principle does not
> > > > have any predictive power at all. It does not tell you anything about
> > > > the future.  It cannot explain away futures in which you still exist
> > > > but irregular things happen. Only a nonuniform prior can explain this.
> > >
> > > Isn't this fixed by saying that the uniform measure is not over all
> > > universe histories, as you have it above, but over all programs that
> > > generate universes?  Now we have the advantage that short programs
> > > generate more regular universes than long ones, and the WAP grows teeth.
> > >
> > > Hal Finney
> > >
> >
> >
> >
> > --
> --
> > Dr. Russell StandishDirector
> > High Performance Computing Support Unit, Phone 9385 6967, 8308 3119
> (mobile)
> > UNSW

Re: Predictions & duplications

2001-10-14 Thread Saibal Mitra


John Mikes wrote:
`` If you say: a sequence defying all
rules, then it is not random, it is calculable. You have to consider all
rules and cut them out.´´

If you try to do that then you encounter the famous halting problem.

Saibal

Re: Predictions & duplications

2001-10-14 Thread jamikes


Dear Russell and Hal,
this is a naive question, however it goes into the basics of your written
explanations. How would YOU define "random"? I had this problem for a long
long time and never got a satisfactory solution. In my (non Indo-European)
language, Hungarian, there is no exact word for it, it is used as a term
meaning "according to whim or taste" (tetszöleges) - which leaves open that
I may not LIKE the 'random' in question. If you say: a sequence defying all
rules, then it is not random, it is calculable. You have to consider all
rules and cut them out. Is a series of 1... random? of course it may be.
Is the pi-decimal random? no, because it is a result of calculation.  If one
says: "just pick it" that would follow all circumstantial pressure of the
situation, maybe unconsciously, yet defying the 'random'.

So what is the content you use behind that word?

I would be surprised if both of you (and others, too) would agree .
Till then I wish you luck to use the word - at random.

Best wishes
John Mikes
- Original Message -
From: "Russell Standish" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Sunday, October 14, 2001 4:36 AM
Subject: Re: Predictions & duplications
>>>>>>>SNIP<<<<<<<<<


> That is almost the correct solution, Hal. If we ask what an observer
> will make of a random description chosen at random, then you get
> regular universes with probability exponentially related to the
> inferred complexity. It is far clearer to see what happen when the
> observer is a UTM, forcibly terminating programs after a
> certain number of steps (representing the observer's resource bound)
> (thus all descriptions are halting programs). Then one obtains a
> Solomon-Levy distribution or universal prior. However, this argument
> also works when the observer is not a UTM, but simply a classification
> device of some kind.
>
> The WAP has nothing to do with this issue, except inasmuch as
> universes can only be observed through the eyes of some observer.
>
> Again I reiterate that Juergen's resource-bounded "Great Programmer"
> religion need be nothing but a reflection of our conscious selves stamped
> upon our observations.
>
> Cheers
>
> [EMAIL PROTECTED] wrote:
> >
> > Juergen writes:
> > > Some seem to think that the weak anthropic principle explains the
> > > regularity. The argument goes like this: "Let there be a uniform
measure
> > > on all universe histories, represented as bitstrings.  Now take the
tiny
> > > subset of histories in which you appear.  Although the measure of this
> > > subset is tiny, its conditional measure, given your very existence,
> > > is not: According to the weak anthropic principle, the conditional
> > > probability of finding yourself in a regular universe compatible with
> > > your existence equals 1."
> > >
> > > But it is essential to see that the weak anthropic principle does not
> > > have any predictive power at all. It does not tell you anything about
> > > the future.  It cannot explain away futures in which you still exist
> > > but irregular things happen. Only a nonuniform prior can explain this.
> >
> > Isn't this fixed by saying that the uniform measure is not over all
> > universe histories, as you have it above, but over all programs that
> > generate universes?  Now we have the advantage that short programs
> > generate more regular universes than long ones, and the WAP grows teeth.
> >
> > Hal Finney
> >
>
>
>
> --
--
> Dr. Russell StandishDirector
> High Performance Computing Support Unit, Phone 9385 6967, 8308 3119
(mobile)
> UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
> Australia[EMAIL PROTECTED]
> Room 2075, Red Centre
http://parallel.hpc.unsw.edu.au/rks
> International prefix  +612, Interstate prefix 02
> --
--
>

Re: Predictions & duplications

2001-10-14 Thread Russell Standish

That is almost the correct solution, Hal. If we ask what an observer
will make of a random description chosen at random, then you get
regular universes with probability exponentially related to the
inferred complexity. It is far clearer to see what happen when the
observer is a UTM, forcibly terminating programs after a
certain number of steps (representing the observer's resource bound)
(thus all descriptions are halting programs). Then one obtains a
Solomon-Levy distribution or universal prior. However, this argument
also works when the observer is not a UTM, but simply a classification
device of some kind.

The WAP has nothing to do with this issue, except inasmuch as
universes can only be observed through the eyes of some observer.

Again I reiterate that Juergen's resource-bounded "Great Programmer"
religion need be nothing but a reflection of our conscious selves stamped
upon our observations. 

Cheers

[EMAIL PROTECTED] wrote:
> 
> Juergen writes:
> > Some seem to think that the weak anthropic principle explains the
> > regularity. The argument goes like this: "Let there be a uniform measure
> > on all universe histories, represented as bitstrings.  Now take the tiny
> > subset of histories in which you appear.  Although the measure of this
> > subset is tiny, its conditional measure, given your very existence,
> > is not: According to the weak anthropic principle, the conditional
> > probability of finding yourself in a regular universe compatible with 
> > your existence equals 1."
> >
> > But it is essential to see that the weak anthropic principle does not
> > have any predictive power at all. It does not tell you anything about
> > the future.  It cannot explain away futures in which you still exist
> > but irregular things happen. Only a nonuniform prior can explain this.
> 
> Isn't this fixed by saying that the uniform measure is not over all
> universe histories, as you have it above, but over all programs that
> generate universes?  Now we have the advantage that short programs
> generate more regular universes than long ones, and the WAP grows teeth.
> 
> Hal Finney
> 

Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

Re: Predictions & duplications

2001-10-12 Thread hal


Juergen writes:
> Some seem to think that the weak anthropic principle explains the
> regularity. The argument goes like this: "Let there be a uniform measure
> on all universe histories, represented as bitstrings.  Now take the tiny
> subset of histories in which you appear.  Although the measure of this
> subset is tiny, its conditional measure, given your very existence,
> is not: According to the weak anthropic principle, the conditional
> probability of finding yourself in a regular universe compatible with 
> your existence equals 1."
>
> But it is essential to see that the weak anthropic principle does not
> have any predictive power at all. It does not tell you anything about
> the future.  It cannot explain away futures in which you still exist
> but irregular things happen. Only a nonuniform prior can explain this.

Isn't this fixed by saying that the uniform measure is not over all
universe histories, as you have it above, but over all programs that
generate universes?  Now we have the advantage that short programs
generate more regular universes than long ones, and the WAP grows teeth.

Hal Finney

Re: Predictions & duplications

2001-10-12 Thread juergen

In reply to Russell Standish and Juho Pennanen I'd just like to
emphasize the main point, which is really trivial: by definition, a
uniform measure on the possible futures makes all future beginnings of
a given size equally likely. Then regular futures clearly are not any
more likely than the irregular ones. I have no idea what makes Russell
think this is debatable. In most possible futures your computer will
vanish within the next second. But it does not. This indicates that our
future is _not_ sampled from a uniform prior.

Some seem to think that the weak anthropic principle explains the
regularity. The argument goes like this: "Let there be a uniform measure
on all universe histories, represented as bitstrings.  Now take the tiny
subset of histories in which you appear.  Although the measure of this
subset is tiny, its conditional measure, given your very existence,
is not: According to the weak anthropic principle, the conditional
probability of finding yourself in a regular universe compatible with 
your existence equals 1."

But it is essential to see that the weak anthropic principle does not
have any predictive power at all. It does not tell you anything about
the future.  It cannot explain away futures in which you still exist
but irregular things happen. Only a nonuniform prior can explain this.

Which nonuniform prior is plausible? My favorite is the "resource-optimal"
Speed Prior. I am hoping and expecting that someone will confirm it soon
by finding a rather fast pseudorandom generator responsible for apparently
noisy events in our universe. But others may be more conservative and
go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe
even for the nonenumerable prior mu^E (which dominates mu^M while still
being computable in the limit) or maybe even for the extreme prior mu^G.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/

> From [EMAIL PROTECTED]  Thu Oct 11 17:36:41 2001
> From: Juho Pennanen <[EMAIL PROTECTED]>
> 
> 
> 
> I tried to understand the problem that doctors Schmidhuber 
> and Standish are discussing by describing it in the most 
> concrete terms I could, below. (I admit beforehand I couldn't 
> follow all  the details and do not know all the papers and 
> theorems referred to, so this could be irrelevant.)
> 
> So, say you are going to drop a pencil from your hand and 
> trying to predict if it's going to fall down or up this 
> time. Using what I understand with comp TOE, I would take 
> the set of all programs that at some state implement a 
> certain conscious state, namely the state in which you 
> remember starting your experiment of dropping the pencil, 
> and have already recorded the end result (I abreviate this 
> conscious state with CS. To be exact it is a set of states,
> but that shouldn't make a difference). 
> 
> The space of all programs would be the set of all programs 
> in some language, coded as infinite numerable sequences of 
> 0's and 1's. (I do not know how much the chosen language + 
> coding has effect on the whole thing). 
> 
> Now for your prediction you need to divide the 
> implementations of CS into two sets: those in which the 
> pencil fell down and those in which it fell up. Then you 
> compare the measures of those sets. (You would need to 
> assume that each program is run just once or something of 
> the sort. Some programs obviously implement CS several 
> times when they run. So you would maybe just include those 
> programs that implement CS infinitely many times, and 
> weight them with the density of CS occurrences during 
> their running.) 
> 
> One way to derive the measure you need is to assume a 
> measure on the set of all infinite sequences (i.e. on
> all programs). For this we have the natural measure, 
> i.e. the product measure of the uniform measure on 
> the set containing 0 and 1. And as far as my intuition 
> goes, this measure would lead to the empirically correct 
> prediction on the direction of the pencil falling. And 
> if I understood it right, this is not too far from what 
> Dr. Standish was claiming? And we wouldn't need any 
> speed priors. 
> 
> But maybe the need of speed prior would come to play if I 
> thought more carefully about the detailed assumptions
> involved? E.g. that each program would be run just once, 
> with the same speed etc? I am not sure.
> 
> Juho 
> 
> /
> Juho Pennanen
> Department of Forest Ecology, P.O.Box 24
> FIN-00014 University of Helsinki
> tel. (09)191 58144 (+358-9-191 58144)
> GSM 040 5455 845 (+358-40-5455 845)
> http://www.helsinki.fi/people/juho.pennanen
> */
> 

> Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700
> From: Russell Standish <[EMAIL PROTECTED]>
> 
> [EMAIL PROTECTED] wrote:
> > 
> > 
> > 
> > Huh? A PDF? You mean a probability density function? On a continuous

Re: Predictions & duplications

2001-10-11 Thread Russell Standish

[EMAIL PROTECTED] wrote:
> 
> 
> 
> Huh? A PDF? You mean a probability density function? On a continuous set? 

Probability Distribution Function. And PDF's are defined on all
measurable sets, not just continuous ones.

> No! I am talking about probability distributions on describable objects. 
> On things you can program. 

Sure...

> 
> Anyway, you write "...observer will expect to see regularities even with 
> the uniform prior" but that clearly cannot be true.
> 

This is where I beg to differ.

> > 
> > Since you've obviously barked up the wrong tree here, it's a little
> > hard to know where to start. Once you understand that each observer
> > must equivalence an infinite number of descriptions due to the
> > boundedness of its resources, it becomes fairly obvious that the
> > smaller, simpler descriptions correspond to larger equivalence classes
> > (hence higher probability).
> 
> Maybe you should write down formally what you mean? Which resource bounds?
> On which machine? What exactly do you mean by "simple"? Are you just
> referring to the traditional Solomonoff-Levin measure and the associated
> old Occam's razor theorems, or do you mean something else?

It is formally written in "Why Occams Razor", and the extension to
non-computational observers is a fairly obvious corollory of the
contents of "On Complexity and Emergence".

To summarise the results for this list, the basic idea is that a
observer acts as a filter, or category machine. On being fed a
description, the description places it into a category, or equivalence
class of descriptions. What we can say is that real observers will
have a finite number of such equivalence classes it can categorise
descriptions into (be they basins of attraction in a neural network,
or output states in the form of books, etc). This is important,
because if we consider the case of UTMs being the classification
device, and outputs from halting programs as being the categories,
then we end up with the S-L distribution over the set of halting
programs. Unfortunately, the set of halting programs has measure zero
within the set of all programs (descriptions), so we end up with the
White Rabbit paradox, namely that purely random descriptions have
probability one.

One solution is to ban the "Wabbit" universes (which you do by means
of the speed prior), and recover the nicely behaved S-L measure,
which, essentially, is your solution (ie S-L convolved with the Speed
Prior).

The other way is to realise that the only example of conscious
observer we are aware of is actually quite resilient to noise - only a
small number of bits in a random string will be considered
significant. We are really very good at detecting patterns in all
sorts of random noise. Hence every random string will be equivalenced
to some regular description. Then one ends up with an _observer
dependent_ S-L-like probability distribution over the set of
categories made by that observer. The corresponding Occam's Razor
theorem follows in the usual way.

I hope this is clear enough for you to follow...

> 
> You are talking falsifiability. I am talking verifiability.  Sure, you
> cannot prove randomness. But that's not the point of any inductive
> science. The point is to find regularities if there are any. Occam's
> razor encourages us to search for regularity, even when we do not know
> whether there is any. Maybe some PhD student tomorrow will discover a
> simple PRG of the kind I mentioned, and get famous.
> 
> It is important to see that Popper's popular and frequently cited and
> overrated concept of falsifiability does not really help much to explain
> what inductive science such as physics is all about. E.g., physicists
> accept Everett's ideas although most of his postulated parallel universes
> will remain inaccessible forever, and therefore are _not_ falsifiable.
> Clearly, what's convincing about the multiverse theory is its simplicity,
> not its falsifiability, in line with Solomonoff's theory of inductive
> inference and Occam's razor, which is not just a wishy-washy philosophical
> framework like Popper's.
> 
> Similarly, today's string physicists accept theories for their simplicity,
> not their falsifiability.  Just like nobody is able to test whether
> gravity is the same on Sirius, but believing it makes things simpler.
> 
> Again: the essential question is: which prior is plausible? Which
> represents the correct notion of simplicity? Solomonoff's traditional
> prior, which does not care for temporal complexity at all? Even more
> general priors computable in the limit, such as those discussed in
> the algorithmic TOE paper?  Or the Speed Prior, which embodies a more
> restricted concept of simplicity that differs from Kolmogorov complexity
> because it takes runtime into account, in an optimal fashion?
> 
> Juergen Schmidhuber
> 
> http://www.idsia.ch/~juergen/
> http://www.idsia.ch/~juergen/everything/html.html
> http://www.idsia.ch/~juergen/toesv2/
> 
> 

The problem is that

Re: Predictions & duplications

2001-10-11 Thread Juho Pennanen




I tried to understand the problem that doctors Schmidhuber 
and Standish are discussing by describing it in the most 
concrete terms I could, below. (I admit beforehand I couldn't 
follow all  the details and do not know all the papers and 
theorems referred to, so this could be irrelevant.)

So, say you are going to drop a pencil from your hand and 
trying to predict if it's going to fall down or up this 
time. Using what I understand with comp TOE, I would take 
the set of all programs that at some state implement a 
certain conscious state, namely the state in which you 
remember starting your experiment of dropping the pencil, 
and have already recorded the end result (I abreviate this 
conscious state with CS. To be exact it is a set of states,
but that shouldn't make a difference). 

The space of all programs would be the set of all programs 
in some language, coded as infinite numerable sequences of 
0's and 1's. (I do not know how much the chosen language + 
coding has effect on the whole thing). 

Now for your prediction you need to divide the 
implementations of CS into two sets: those in which the 
pencil fell down and those in which it fell up. Then you 
compare the measures of those sets. (You would need to 
assume that each program is run just once or something of 
the sort. Some programs obviously implement CS several 
times when they run. So you would maybe just include those 
programs that implement CS infinitely many times, and 
weight them with the density of CS occurrences during 
their running.) 

One way to derive the measure you need is to assume a 
measure on the set of all infinite sequences (i.e. on
all programs). For this we have the natural measure, 
i.e. the product measure of the uniform measure on 
the set containing 0 and 1. And as far as my intuition 
goes, this measure would lead to the empirically correct 
prediction on the direction of the pencil falling. And 
if I understood it right, this is not too far from what 
Dr. Standish was claiming? And we wouldn't need any 
speed priors. 

But maybe the need of speed prior would come to play if I 
thought more carefully about the detailed assumptions
involved? E.g. that each program would be run just once, 
with the same speed etc? I am not sure.

Juho 




/
Juho Pennanen
Department of Forest Ecology, P.O.Box 24
FIN-00014 University of Helsinki
tel. (09)191 58144 (+358-9-191 58144)
GSM 040 5455 845 (+358-40-5455 845)
http://www.helsinki.fi/people/juho.pennanen
*/

Re: Predictions & duplications

2001-10-11 Thread juergen




> > > From [EMAIL PROTECTED] :
> > > [EMAIL PROTECTED] wrote:
> > > > 
> > > > So you NEED something additional to explain the ongoing regularity.
> > > > You need something like the Speed Prior, which greatly favors regular 
> > > > futures over others.
> > > 
> > > I take issue with this statement. In Occam's Razor I show how any
> > > observer will expect to see regularities even with the uniform prior
> > > (comes about because all observers have resource problems,
> > > incidently). The speed prior is not necessary for Occam's Razor. It is
> > > obviously consistent with it though.
> > 
> > First of all: there is _no_ uniform prior on infinitely many things.
> > Try to build a uniform prior on the integers. (Tegmark also wrote that
> > "... all mathematical structures are a priori given equal statistical
> > weight," but of course this does not make much sense because there is
> > _no_ way of assigning equal nonvanishing probability to all - infinitely
> > many - mathematical structures.)
> 
> I don't know why you insist on the prior being a PDF. It is not
> necessary. With the uniform prior, all finite sets have vanishing
> probability. However, all finite descriptions correspond to infinite
> sets, and these infinite sets have non-zero probability.

Huh? A PDF? You mean a probability density function? On a continuous set? 
No! I am talking about probability distributions on describable objects. 
On things you can program. 

Anyway, you write "...observer will expect to see regularities even with 
the uniform prior" but that clearly cannot be true.

> > There is at best a uniform measure on _beginnings_ of strings. Then
> > strings of equal size have equal measure.
> > 
> > But then regular futures (represented as strings) are just as likely
> > as irregular ones. Therefore I cannot understand the comment: "(comes
> > about because all observers have resource problems, incidently)."
> 
> Since you've obviously barked up the wrong tree here, it's a little
> hard to know where to start. Once you understand that each observer
> must equivalence an infinite number of descriptions due to the
> boundedness of its resources, it becomes fairly obvious that the
> smaller, simpler descriptions correspond to larger equivalence classes
> (hence higher probability).

Maybe you should write down formally what you mean? Which resource bounds?
On which machine? What exactly do you mean by "simple"? Are you just
referring to the traditional Solomonoff-Levin measure and the associated
old Occam's razor theorems, or do you mean something else?

> > Of course, alternative priors lead to alternative variants of Occam's
> > razor.  That has been known for a long time - formal versions of Occam's
> > razor go at least back to Solomonoff, 1964.  The big question really
> > is: which prior is plausible? The most general priors we can discuss are
> > those computable in the limit, in the algorithmic TOE paper. They do not
> > allow for computable optimal prediction though. But the more restrictive
> > Speed Prior does, and seems plausible from any programmer's point of view.
> > 
> > > The interesting thing is of course whether it is possible to
> > > experimentally distinguish between the speed prior and the uniform
> > > prior, and it is not at all clear to me that it is possible to
> > > distinguish between these cases.
> > 
> > I suggest to look at experimental data that seems to have Gaussian
> > randomness in it, such as interference patterns in split experiments.
> > The Speed Prior suggests the data cannot be really random, but that a
> > fast pseudorandom generator PRG is responsible, e.g., by dividing some
> > seed by 7 and taking some of the resulting digits as the new seed, or
> > whatever. So it's verifiable - we just have to discover the PRG method.
> > 
> 
> I can't remember which incompleteness result it is, but it is
> impossible to prove the randomness of any sequence. In order to
> falsify your theory one would need to prove a sequence to be
> random. However, of course if all known sequences are provably
> pseudo-random (ie compressible), then this would constitute pretty
> good evidence. However, this is a tall order, as there is no algorithm
> for generating the compression behind an arbitrary sequence.

You are talking falsifiability. I am talking verifiability.  Sure, you
cannot prove randomness. But that's not the point of any inductive
science. The point is to find regularities if there are any. Occam's
razor encourages us to search for regularity, even when we do not know
whether there is any. Maybe some PhD student tomorrow will discover a
simple PRG of the kind I mentioned, and get famous.

It is important to see that Popper's popular and frequently cited and
overrated concept of falsifiability does not really help much to explain
what inductive science such as physics is all about. E.g., physicists
accept Everett's ideas although most of his postulated parallel universes
will remain inaccessib

Re: Predictions & duplications

2001-10-11 Thread Russell Standish


[EMAIL PROTECTED] wrote:
> 
> 
> 
> > From [EMAIL PROTECTED] :
> > [EMAIL PROTECTED] wrote:
> > > 
> > > So you NEED something additional to explain the ongoing regularity.
> > > You need something like the Speed Prior, which greatly favors regular 
> > > futures over others.
> > 
> > I take issue with this statement. In Occam's Razor I show how any
> > observer will expect to see regularities even with the uniform prior
> > (comes about because all observers have resource problems,
> > incidently). The speed prior is not necessary for Occam's Razor. It is
> > obviously consistent with it though.
> 
> First of all: there is _no_ uniform prior on infinitely many things.
> Try to build a uniform prior on the integers. (Tegmark also wrote that
> "... all mathematical structures are a priori given equal statistical
> weight," but of course this does not make much sense because there is
> _no_ way of assigning equal nonvanishing probability to all - infinitely
> many - mathematical structures.)

I don't know why you insist on the prior being a PDF. It is not
necessary. With the uniform prior, all finite sets have vanishing
probability. However, all finite descriptions correspond to infinite
sets, and these infinite sets have non-zero probability.


> 
> There is at best a uniform measure on _beginnings_ of strings. Then
> strings of equal size have equal measure.
> 
> But then regular futures (represented as strings) are just as likely
> as irregular ones. Therefore I cannot understand the comment: "(comes
> about because all observers have resource problems, incidently)."
> 

Since you've obviously barked up the wrong tree here, it's a little
hard to know where to start. Once you understand that each observer
must equivalence an infinite number of descriptions due to the
boundedness of its resources, it becomes fairly obvious that the
smaller, simpler descriptions correspond to larger equivalence classes
(hence higher probability).

> Of course, alternative priors lead to alternative variants of Occam's
> razor.  That has been known for a long time - formal versions of Occam's
> razor go at least back to Solomonoff, 1964.  The big question really
> is: which prior is plausible? The most general priors we can discuss are
> those computable in the limit, in the algorithmic TOE paper. They do not
> allow for computable optimal prediction though. But the more restrictive
> Speed Prior does, and seems plausible from any programmer's point of view.
> 
> > The interesting thing is of course whether it is possible to
> > experimentally distinguish between the speed prior and the uniform
> > prior, and it is not at all clear to me that it is possible to
> > distinguish between these cases.
> 
> I suggest to look at experimental data that seems to have Gaussian
> randomness in it, such as interference patterns in split experiments.
> The Speed Prior suggests the data cannot be really random, but that a
> fast pseudorandom generator PRG is responsible, e.g., by dividing some
> seed by 7 and taking some of the resulting digits as the new seed, or
> whatever. So it's verifiable - we just have to discover the PRG method.
> 

I can't remember which incompleteness result it is, but it is
impossible to prove the randomness of any sequence. In order to
falsify your theory one would need to prove a sequence to be
random. However, of course if all known sequences are provably
pseudo-random (ie compressible), then this would constitute pretty
good evidence. However, this is a tall order, as there is no algorithm
for generating the compression behind an arbitrary sequence.

Unless someone else has some brilliant ideas, its all looking a bit grim.

> Juergen Schmidhuber
> 
> http://www.idsia.ch/~juergen/
> http://www.idsia.ch/~juergen/everything/html.html
> http://www.idsia.ch/~juergen/toesv2/
> 
> 
> 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

Re: Predictions & duplications

2001-10-11 Thread juergen

> From [EMAIL PROTECTED] :
> [EMAIL PROTECTED] wrote:
> > 
> > So you NEED something additional to explain the ongoing regularity.
> > You need something like the Speed Prior, which greatly favors regular 
> > futures over others.
> 
> I take issue with this statement. In Occam's Razor I show how any
> observer will expect to see regularities even with the uniform prior
> (comes about because all observers have resource problems,
> incidently). The speed prior is not necessary for Occam's Razor. It is
> obviously consistent with it though.

First of all: there is _no_ uniform prior on infinitely many things.
Try to build a uniform prior on the integers. (Tegmark also wrote that
"... all mathematical structures are a priori given equal statistical
weight," but of course this does not make much sense because there is
_no_ way of assigning equal nonvanishing probability to all - infinitely
many - mathematical structures.)

There is at best a uniform measure on _beginnings_ of strings. Then
strings of equal size have equal measure.

But then regular futures (represented as strings) are just as likely
as irregular ones. Therefore I cannot understand the comment: "(comes
about because all observers have resource problems, incidently)."

Of course, alternative priors lead to alternative variants of Occam's
razor.  That has been known for a long time - formal versions of Occam's
razor go at least back to Solomonoff, 1964.  The big question really
is: which prior is plausible? The most general priors we can discuss are
those computable in the limit, in the algorithmic TOE paper. They do not
allow for computable optimal prediction though. But the more restrictive
Speed Prior does, and seems plausible from any programmer's point of view.

> The interesting thing is of course whether it is possible to
> experimentally distinguish between the speed prior and the uniform
> prior, and it is not at all clear to me that it is possible to
> distinguish between these cases.

I suggest to look at experimental data that seems to have Gaussian
randomness in it, such as interference patterns in split experiments.
The Speed Prior suggests the data cannot be really random, but that a
fast pseudorandom generator PRG is responsible, e.g., by dividing some
seed by 7 and taking some of the resulting digits as the new seed, or
whatever. So it's verifiable - we just have to discover the PRG method.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/

Re: Predictions & duplications

2001-10-10 Thread Russell Standish

[EMAIL PROTECTED] wrote:
> 
> So you NEED something additional to explain the ongoing regularity.
> You need something like the Speed Prior, which greatly favors regular 
> futures over others.
> 

I take issue with this statement. In Occam's Razor I show how any
observer will expect to see regularities even with the uniform prior
(comes about because all observers have resource problems,
incidently). The speed prior is not necessary for Occam's Razor. It is
obviously consistent with it though.

The interesting thing is of course whether it is possible to
experimentally distinguish between the speed prior and the uniform
prior, and it is not at all clear to me that it is possible to
distinguish between these cases.

Cheers

Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

Re: Predictions & duplications

2001-10-10 Thread juergen




Bruno, there are so many misleading or unclear statements
in your post - I do not even know where to start. I'll insert a few
comments below.

> Subject: Re: Predictions & duplications
> From: Marchal <[EMAIL PROTECTED]>
> Juergen Schmidhuber wrote:
> 
> >We need a prior probability distribution on possible histories.
> OK. I agree with that. But of course we differ on the meaning of
> "possible histories". And we tackle also the "prior probability"
> in quite different ways. 
> 
> >Then, once we have observed a past history, we can use Bayes' rule to
> >compute the probability of various futures, given the observations. Then
> >we predict the most probable future.
> 
> It seems to me that Bayes' rule is what we *need* to build a plausible
> *past* before, and concerning the future, and even the present, I prefer
> a theory, or a philosophy (or religion), hoping it can put light on
> the unknown.
> 
> >The algorithmic TOE paper discusses several formally describable priors
> >computable in the limit. But for the moment let me focus on my favorite,
> >the Speed Prior.
> 
> Mmh... You know that there exist class of infinitely speedable
> programs (and theories), and even that those class correspond to a
> superclass of universal machines. This entails the UD, 
> the Universal Dovetailer, which I like to call sometimes
> the 'splashed Universal Turing Machine',
> (which btw makes UD* sort of denotational semantics of the UTM) could
> generate and execute quicker and quicker version of itself. 
> This means you need a non universal ontology. 
> For helping other to understand the point I
> propose to await discussing this point after I give the solution of the
> problem in  "diagonalisation 1": 
> http://www.escribe.com/science/theory/m3079.html
> 
> >Assume the Great Programmer has a resource problem, just like any 
> >other programmer.
> 
> You told Wei Dai not taking the Great Programmer to seriously.
> I am not sure I can imagine the Great Programmer having resource
> problem. Unless you are a finitist! Knowing that the UTM emulating
> the UD will use an unboundable part of its tape ...
> How could the GP have a resource problem when he is the one
> building universes in which resource problem can happen (apparently)?
> Are you not presuposing some physical structure in advance somewhere?

You can be a GP. Write a program that computes all universes
computable in the limit. Add storage in case of storage overflow.
Still, you as a GP have a resource problem. At some point it will
get harder and harder for you to keep computing all the things
you'd like to compute.

The beings evolving in your universes won't be able to deduce much about
the physics of your world in which your computer is built. Just like I
cannot say much about the environment of the particular GP who programmed 
us. That's where religion starts, and where algorithmic TOEs stop 
making statements.

> >According to the Speed Prior, the harder it is to compute
> >something, the less likely it gets. More precisely, the cumulative prior
> >probability of all data whose computation costs at least O(n) time is
> >at most inversely proportional to n.
> 
> Very interesting idea. I would say: the harder it is to compute
> something FROM here and now the less likely it gets from here and now. 

Same thing. The Speed Prior leads to conditional probabilities of futures, 
given the past.

> And I would
> add that this is only true if your neighborhood is less complex than
> yourself. I really believe your idea is interesting for some low level
> computational physics. But when agent, including yourself, enters in the
> play, its another matter, because it is intrinsically harder to compute
> things then.

I do not understand. But please do _not_ elaborate. 

> >To evaluate the plausibility of this, consider your own computer: most
> >processes just take a few microseconds, some take a few seconds, few
> >take hours, very few take days, etc...
> 
> ?
> In the work of the GP, that is in UD*, most processes takes billions
> of googolplex kalpa ...

Not if the GP has a resource problem, just like any programmer
of virtual realities has a resource problem.

Again: forget all the esoteric undecidability stuff, which does not
really matter here, and be pragmatic and try to look at things from the
perspective of some observer being computed as part of a virtual reality
that you programmed. The observer may get smart and correctly suspect
you have a resource problem, and can make nontrivial predictions from
there, using the Speed Prior.

> A lot of processes engaged by the UD simply does not stop.
> Some are interesting and p

Re: Predictions & duplications

2001-10-09 Thread Marchal

Juergen Schmidhuber wrote:

>We need a prior probability distribution on possible histories.

OK. I agree with that. But of course we differ on the meaning of
"possible histories". And we tackle also the "prior probability"
in quite different ways. 

>Then, once we have observed a past history, we can use Bayes' rule to
>compute the probability of various futures, given the observations. Then
>we predict the most probable future.

It seems to me that Bayes' rule is what we *need* to build a plausible
*past* before, and concerning the future, and even the present, I prefer
a theory, or a philosophy (or religion), hoping it can put light on
the unknown.

>The algorithmic TOE paper discusses several formally describable priors
>computable in the limit. But for the moment let me focus on my favorite,
>the Speed Prior.

Mmh... You know that there exist class of infinitely speedable
programs (and theories), and even that those class correspond to a
superclass of universal machines. This entails the UD, 
the Universal Dovetailer, which I like to call sometimes
the 'splashed Universal Turing Machine',
(which btw makes UD* sort of denotational semantics of the UTM) could
generate and execute quicker and quicker version of itself. 
This means you need a non universal ontology. 
For helping other to understand the point I
propose to await discussing this point after I give the solution of the
problem in  "diagonalisation 1": 
http://www.escribe.com/science/theory/m3079.html

>Assume the Great Programmer has a resource problem, just like any 
>other programmer.

You told Wei Dai not taking the Great Programmer to seriously.
I am not sure I can imagine the Great Programmer having resource
problem. Unless you are a finitist! Knowing that the UTM emulating
the UD will use an unboundable part of its tape ...
How could the GP have a resource problem when he is the one
building universes in which resource problem can happen (apparently)?
Are you not presuposing some physical structure in advance somewhere?

>According to the Speed Prior, the harder it is to compute
>something, the less likely it gets. More precisely, the cumulative prior
>probability of all data whose computation costs at least O(n) time is
>at most inversely proportional to n.

Very interesting idea. I would say: the harder it is to compute
something FROM here and now the less likely it gets from here and now. 
And I would
add that this is only true if your neighborhood is less complex than
yourself. I really believe your idea is interesting for some low level
computational physics. But when agent, including yourself, enters in the
play, its another matter, because it is intrinsically harder to compute
things then.

>To evaluate the plausibility of this, consider your own computer: most
>processes just take a few microseconds, some take a few seconds, few
>take hours, very few take days, etc...

?
In the work of the GP, that is in UD*, most processes takes billions
of googolplex kalpa ...
A lot of processes engaged by the UD simply does not stop.
Some are interesting and perhaps Turing Universal like the process
raising better and better approximations of the Mandelbrot
set. 

>We do not know the Great Programmer's computer and algorithm for computing
>universe histories and other things. Still, we know that he cannot do
>it more efficiently than the asymptotically fastest algorithm.

Are you really sure about that. If the GP is really *the* GP,
existing by Church Thesis, I think it follows from Blum Speed-up
Theorem that it has no asymptotically fastest algorithm.
But you are talking of "our little program" generated and executed
by the UD perhaps.
If this one has an asymptotically fastest algorithm, are you sure it
still can emulate a UTM? I bet the "universe" is universal.
Also:  how could we know and test we are in a quick or slow version of 
that little program. Is that not (absolutely) undecidable.
Should that program really be run? If yes, are you not presupposing
again a physical universe?

>The Speed Prior reflects properties of precisely this optimal algorithm.
>It turns out that the best way of predicting future y, given past x,
>is to minimize the (computable) Kt-complexity of (x,y).  As observation
>size grows, the predictions converge to the optimal predictions.

As observation grows knowledge grows, and we grows making everything
more complex that before. I guess you mean "optimal predictions" at 
some level. With comp the more knowledge grows, the more ignorance
grows.

>To address Bruno Marchal's questions: Many of my duplications in parallel
>universes with identical past x will experience different futures y.
>None of my copies knows a priori which one it is. But each does know:
>Those futures y with high Kt(x,y) will be highly unlikely; those with
>low Kt(x,y) won't.

It is indeed hard to write a little program which generate a long
Kolmogorov-Chaitin incompressible 01 string.
But, as I told you earlier, it is quite ea

Re: Predictions & duplications

2001-10-08 Thread Russell Standish


In summary, it would appear that Juergen is not disputing the use of
the UDA after all, but simply the use of the uniform measure as an
initial prior over the everything of all description. He argues that
we should use the speed prior instead. 

Another way of putting it is that he doesn't dispute the role of
indeterminism in generating observable complexity, only that it is
necessarily limited in the amount of complexity that can be generated.

None of this is in conflict with Occam's razor, or observable evidence
to date. In fact it is debatable if it is experimentally falsifiable -
no sequence can be proven to be truly random - there is always the
possibility of some pseudo random generator being found to be the
source.

Cheers

[EMAIL PROTECTED] wrote:
> 
> 
> 
> Predictions & duplications
> 
> 
> > From: Marchal <[EMAIL PROTECTED]> Thu Oct  4 11:58:13 2001
> > [...]
> > You have still not explain to me how you predict your reasonably next
> > experience in the simple WM duplication. [...]
> > So, how is it that you talk like if you do have an algorithm
> > capable of telling you in advance what will be your personal
> > experience in a self-duplication experience. [...]
> > Where am I wrong? 
> > Bruno
> 
> 
> I will try again: how can we predict what's going to happen?
> 
> We need a prior probability distribution on possible histories.
> Then, once we have observed a past history, we can use Bayes' rule to
> compute the probability of various futures, given the observations. Then
> we predict the most probable future.
> 
> The algorithmic TOE paper discusses several formally describable priors
> computable in the limit. But for the moment let me focus on my favorite,
> the Speed Prior.
> 
> Assume the Great Programmer has a resource problem, just like any other
> programmer. According to the Speed Prior, the harder it is to compute
> something, the less likely it gets. More precisely, the cumulative prior
> probability of all data whose computation costs at least O(n) time is
> at most inversely proportional to n.
> 
> To evaluate the plausibility of this, consider your own computer: most
> processes just take a few microseconds, some take a few seconds, few
> take hours, very few take days, etc...
> 
> We do not know the Great Programmer's computer and algorithm for computing
> universe histories and other things. Still, we know that he cannot do
> it more efficiently than the asymptotically fastest algorithm.
> 
> The Speed Prior reflects properties of precisely this optimal algorithm.
> It turns out that the best way of predicting future y, given past x,
> is to minimize the (computable) Kt-complexity of (x,y).  As observation
> size grows, the predictions converge to the optimal predictions.
> 
> To address Bruno Marchal's questions: Many of my duplications in parallel
> universes with identical past x will experience different futures y.
> None of my copies knows a priori which one it is. But each does know:
> Those futures y with high Kt(x,y) will be highly unlikely; those with
> low Kt(x,y) won't.
> 
> Under the Speed Prior it is therefore extremely unlikely that I will
> suddenly observe truly random, inexplicable, surprising events, simply
> because true randomness is very hard to compute, and thus has very high
> Kt complexity.
> 
> Juergen Schmidhuber
> 
> http://www.idsia.ch/~juergen/
> http://www.idsia.ch/~juergen/everything/html.html
> http://www.idsia.ch/~juergen/toesv2/
> 
> 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 (")
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02

41 matches

Mail list logo