Re: Predictions & duplications
I comment again a relatively old post by Juergen Schmidhuber. >Juergen: The thing is: when you generate them all, and assume that all are >equally likely, in the sense that all beginnings of all strings are >uniformly distributed, then you cannot explain why the regular universes >keep being regular. The random futures are then just as likely as the >nonrandom ones. >So you NEED something additional to explain the ongoing regularity. >You need something like the Speed Prior, which greatly favors regular >futures over others. > >> Bruno: Take for exemple the iterated duplication experience/experiment. >> It can be automated by a very simple program. After 64 iterations >> there will be 1,84467.10^19 agents, and most of them will have >> an incompressible 01-history. With the comp indeterminism it is >> much more likely you will get such a really random string >> (independently of the fact that you will be unable to prove it is >> really random). Those with computable strings will be exceptional, so >> that, if those agents work together they will consider (even with >> Bayes) that the simple self-multiplying algorithm is the simplest >> and shorter explanation, for those randomeness appearances. > >Juergen: But don't you see? Why does a particular agent, say, >yourself, with a >nonrandom past, have a nonrandom future? But we have random future. Just send a sequence of particles in the state 1/sqrt(2)( up + down ) through an "analyser". Write 0 and 1 each time a particular particle go through. Both QM+comp (Everett) or comp alone (I will not elaborate !) explain this first person (plural) point of view randomization by a relative self-multiplication/division/differentiation. As I said the simple self-multiplying algorithm is the simplest and shorter explanation, for those randomness appearances. >Why is your computer still there >after one second, although in a truly random world it would immediately >dissolve? No one says reality is *only* a truly random realm, especially when third person reality is given by UD*, the trace of the Universal Dovetailer. Arithmetically it is just the set of true \sigma_1 sentences. That's our (third person) atomic truth. Of course those truth are "UD" provable. >Why do pencils keep falling down instead of up, when the >futures where they fall up are just as likely? Because those sets of machine accessible arithmetical truth, as viewed by the machines themselves (this introduce the modalities) is highly structured. Indeed the UD has this nasty habit of dovetailing on the reals, so that some randomisation is at work. But some equilibrium between randomization and local lawfullness has to be made in the limit (where first person point of views are eventually defined in Plato Heaven). Our own stability could only rely on the randomization of the details of our stories, randomisation which we should necessarily observe if we look at ourself *below* our (apparently common) substitution level. Although this gives a sort of necessary prior (need of quantisation of the "classical stories", transforming H into exp(-iH)), I prefer, following my naive idea to interview directly the sound Universal Machine. I define "probability of p = 1" in (Peano) Arithmetic by Bew('p) & Con('p). In this way p is true in all consistent extensions (Bew ('p)) and there is (the bet part) at least some consistent extension (Con('p)). I restrict the arithmetical interpretation p to the Arithmetical Sigma_1 sentences (the aritmetical UD). This gives an arithmetical quantization of p by []<>p, (with []p = Bew('p) & Con('p)). It indeed obeys a sort of quantum logic. As alwaysCon 'p ( <>p ) = -Bew('-p) ( -[]-p ). But I fear you don't believe in any form of "strict" indeterminism, neither comp nor QM, isn't it? Bruno
Re: Predictions & duplications
Juergen Schmidhuber wrote: > > > > > From: Russell Standish <[EMAIL PROTECTED]> > > > > The only reason for not accepting the simplest thing is if it can be > > > > shown to be logically inconsistent. This far, you have shown no such > > > > thing, but rather demonstrated an enormous confusion between measure > > > > and probability distribution. > > > > > > Russell, this is really too much; please read any intro to measure > > > theory, > > > e.g., the one by Halmos. Measures in general are _not_ probability > > > distributions. They are more than that; you can view them as _sets_ of > > > probability distributions on sets that are related to each other in a > > > specific way. Probability density functions define measures and > > > therefore > > > in general aren't probability distributions either. The uniform PDF on > > > [0..1] yields the perfect trivial example; that's exactly why I used it > > > before. In the computability context, compare LI/Vitanyi 1997, chapters > > > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 > > > (continuous sample space with (semi)measure M(x)). > > > > This is an excellent reference, thank you. Example 4.2.1 gives the > > construction of the uniform measure over the set of strings, precisely > > what you claim didn't exist when we started this discussion. > > You just want to annoy me, don't you? For the last time: There is a > uniform *measure* on the strings, but no uniform probability > distribution. Who is annoying who? I never claimed a uniform probability density (I actually looked up the difference between probability density and probability distribution yesterday in Li and Vitanyi) - only a uniform measure. In fact, my first comment to you was "Why do you insist on it being a PDF?". Remember? The simplest prior is the uniform measure on infinite strings. > > > > > 1) Halting theorem implies the set of halting programs is of measure > > > > zero within the set of all programs > > > > > > You mean, given a uniform measure? But why should you need the halting > > > theorem for this trivial statement? In fact, what exactly do you mean > > > by > > > halting theorem? (Schoenfield's book provides a good intro to > > > mathematical > > > logic and computability theory.) > > > > > > > The halting theorem is that there is no algorithm for predicting > > whether any program will halt. Of course, the result I was referring > > to is that the set of compressible strings is of measure zero. The set > > of halting programs actually has finite measure, since \Omega > > > 0. However, there is still finite probability of an input string > > being nonhalting, and indeed the probability seems to be greater than > > 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still > > presents a problem for computationalism. > > This is mixing up everything, in particular, different measures. > 1) Infinite strings with finite program have *nonzero* measure, given > the *universal* measure (Omega is a number derived from the universal > measure). 2) They have measure zero, given the *uniform* measure (Omega > has nothing to do with it). The uniform measure and the universal measure are related. The uniform prior is on the set of input strings to a UTM, the universal measure is the induced measure on the output strings. No. 1) here is the relevant statement, not 2). 3) To see 2) we do not need the halting > theorem - we just compare the finite programs (countably many) to all > strings (uncountably many) and we are done. 4) The halting theorem is > no problem whatsoever when it comes to computable universes. Why care > whether some computable universe stops or not? If it does, it does, > otherwise it does not. In both cases we can compute it in the limit. > > > > > The GP program (aka Universal Dovetailer) is not the simplest > > > > thing. The simplest describable thing is the set of all descriptions, > > > > that simply exist without being generated. That has precisely zero > > > > information or complexity, whereas the UD has some complexity of the > > > > order of a few 100 bits. The simplest prior is the uniform one, ie > > > > there is no bias whatsoever in the selection. > > > > > > Exist without being generated? Precisely zero information or > > > complexity? But with respect to what? > > > > With respect to the observer. With the respect to anything in > > fact. The set of all possible descriptions has precisely zero > > information, by definition, regardless of what observer or context you > > employ. > > This is a vague informal statement, not a definition. Which are the > possible descriptions? What's the formal difference between a > description > and a non-description? What is your "anything" if there is no device > for describing it formally? If "anything" does not convey any bit of > info then what exactly is it that conveys one bit? Or two? > Choose an alphabet (a finite set of symbols). The set of all descripti
Re: Predictions & duplications
> > > From: Russell Standish <[EMAIL PROTECTED]> > > > The only reason for not accepting the simplest thing is if it can be > > > shown to be logically inconsistent. This far, you have shown no such > > > thing, but rather demonstrated an enormous confusion between measure > > > and probability distribution. > > > > Russell, this is really too much; please read any intro to measure > > theory, > > e.g., the one by Halmos. Measures in general are _not_ probability > > distributions. They are more than that; you can view them as _sets_ of > > probability distributions on sets that are related to each other in a > > specific way. Probability density functions define measures and > > therefore > > in general aren't probability distributions either. The uniform PDF on > > [0..1] yields the perfect trivial example; that's exactly why I used it > > before. In the computability context, compare LI/Vitanyi 1997, chapters > > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 > > (continuous sample space with (semi)measure M(x)). > > This is an excellent reference, thank you. Example 4.2.1 gives the > construction of the uniform measure over the set of strings, precisely > what you claim didn't exist when we started this discussion. You just want to annoy me, don't you? For the last time: There is a uniform *measure* on the strings, but no uniform probability distribution. > > > 1) Halting theorem implies the set of halting programs is of measure > > > zero within the set of all programs > > > > You mean, given a uniform measure? But why should you need the halting > > theorem for this trivial statement? In fact, what exactly do you mean > > by > > halting theorem? (Schoenfield's book provides a good intro to > > mathematical > > logic and computability theory.) > > > > The halting theorem is that there is no algorithm for predicting > whether any program will halt. Of course, the result I was referring > to is that the set of compressible strings is of measure zero. The set > of halting programs actually has finite measure, since \Omega > > 0. However, there is still finite probability of an input string > being nonhalting, and indeed the probability seems to be greater than > 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still > presents a problem for computationalism. This is mixing up everything, in particular, different measures. 1) Infinite strings with finite program have *nonzero* measure, given the *universal* measure (Omega is a number derived from the universal measure). 2) They have measure zero, given the *uniform* measure (Omega has nothing to do with it). 3) To see 2) we do not need the halting theorem - we just compare the finite programs (countably many) to all strings (uncountably many) and we are done. 4) The halting theorem is no problem whatsoever when it comes to computable universes. Why care whether some computable universe stops or not? If it does, it does, otherwise it does not. In both cases we can compute it in the limit. > > > The GP program (aka Universal Dovetailer) is not the simplest > > > thing. The simplest describable thing is the set of all descriptions, > > > that simply exist without being generated. That has precisely zero > > > information or complexity, whereas the UD has some complexity of the > > > order of a few 100 bits. The simplest prior is the uniform one, ie > > > there is no bias whatsoever in the selection. > > > > Exist without being generated? Precisely zero information or > > complexity? But with respect to what? > > With respect to the observer. With the respect to anything in > fact. The set of all possible descriptions has precisely zero > information, by definition, regardless of what observer or context you > employ. This is a vague informal statement, not a definition. Which are the possible descriptions? What's the formal difference between a description and a non-description? What is your "anything" if there is no device for describing it formally? If "anything" does not convey any bit of info then what exactly is it that conveys one bit? Or two? > Any traditional definition of > > information/simplicity requires the specification of a formal > > description > > method, such as a Turing machine. You don't need one? Then how do you > > define information or complexity? How exactly do you define "simple" ? > > Actually, all it needs is an equivalencing procedure. See my "On > Complexity and Emergence" paper. A Turing machine will do the job for > you, but so will a neural network or an "expert system" following > heuristic rules. So you have to write down formally this equivalencing procedure, right? Why should this be different in principle from writing down the formal rules for a Turing machine? Why is a simplicity measure that depends on your equivalencing procedure superior to a simplicity measure depending on a Turing machine? (The TM procedure includes yours - because it includes all procedu
Re: Predictions & duplications
Schmidhuber wrote: >Why care for the subset of provable sentences? >Aren't we interested in the full set of all describable sentences? We are interested in the true sentences. The provable one and the unprovable one. >We can generate it, without caring for proofs at all. If you mean generate all describable *sentences* , this is trivial but then you generate only a language. If you mean generate all computations, it is equivalent with generating the proofs of all \Sigma_1 sentences (= the "arithmetical" UD). The provability predicate is just a universal \Sigma_1 sentence. (I recall a \Sigma_1 sentence is a sentence equivalent to (Ex)P(x) with P algorithmicaly decidable). >"Unspeakable" means "beyond formal definition." Beyond turing nameability? Beyond formal definition by whom? You know the universal sound machine has no mean to define its own truth predicate (Tarski). Is it in that sense? The knowledge of p by the machine can be define (in a first approximation) by "p & Bew('p)", in which case, although thoroughly clear at the metalevel, the knowledge *by* the machine is beyound any formal definition *for* the machine. Bew = Godel provability predicate, 'p = the godel number of "p". Bew(x) = (Ey)B(y,x) with B(y,x) means y is the godel number of a proof of a sentence with godel number x. >In particular, the >way we humans communicate and agree on a common formal language is not >formally defined. OK. Most importantly the very meaning of "finite" or number, cannot be formally defined. But I show that as far as we are sound universal machine, or own knowledgeability "predicate" cannot be formally defined by us. >You write "x" and I say, ok, it's an "x", but that's >all based on some sort of informal agreement that involves complex >pattern recognition and learning processes. Usually we do not question >the validity of outcomes of such cognitive processes, and just go ahead >with formal derivations. Yes, but the situation is worst for the very *meaning* of our utterances. >But there is this major informal step *before* >formal reasoning can start, and good textbooks on logic acknowledge >this. There is *this* one, but there is a deeper one with the *meaning* of our formal presentations. We really bet we share a common intuition on the notion of finitary procedure. Note that intuitionist and classicist diverge beyond natural numbers. We are not on the same lenghtwave Juergen. You believe there is a computable universe. I am agnostic about that. But if I survive or (just experience nothing) through a digital functionnal substitution made at some level L then my experience is defined by the set of all consistent extensions defined by all consistent histories definissable below L. With comp realities emerge from the way consistent machine's memories organise themselves (through the many many histories going through their states). Your speed prior would be useful if it shows how it enhances the weight of normal stories in the limit first person experiences (due to unawareness of the reconstitution delays). Note that quantum quantization of classical theories provide such an explanation. I show comp need to extract that quantization from a measure on the consistent extensions. I show the "probability one" obeys the main axioms of quantum logic. You ask me often why I give so much importance to the notion of provability. The reason is that in some sense I just interview scientific machines, as they can appear in the GP's work, about what they are able to prove about their own consistent extensions, and how and why they can arrive to probability consisderation. And "provability" is enough non trivial for providing clues on the minimal necessary logical structures on that set of consistent extensions. Bruno
Re: Predictions & duplications
Juergen Schmidhuber wrote: > > From: Russell Standish <[EMAIL PROTECTED]> > > The only reason for not accepting the simplest thing is if it can be > > shown to be logically inconsistent. This far, you have shown no such > > thing, but rather demonstrated an enormous confusion between measure > > and probability distribution. > > Russell, this is really too much; please read any intro to measure > theory, > e.g., the one by Halmos. Measures in general are _not_ probability > distributions. They are more than that; you can view them as _sets_ of > probability distributions on sets that are related to each other in a > specific way. Probability density functions define measures and > therefore > in general aren't probability distributions either. The uniform PDF on > [0..1] yields the perfect trivial example; that's exactly why I used it > before. In the computability context, compare LI/Vitanyi 1997, chapters > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 > (continuous sample space with (semi)measure M(x)). This is an excellent reference, thank you. Example 4.2.1 gives the construction of the uniform measure over the set of strings, precisely what you claim didn't exist when we started this discussion. > > > 1) Halting theorem implies the set of halting programs is of measure > > zero within the set of all programs > > You mean, given a uniform measure? But why should you need the halting > theorem for this trivial statement? In fact, what exactly do you mean > by > halting theorem? (Schoenfield's book provides a good intro to > mathematical > logic and computability theory.) > The halting theorem is that there is no algorithm for predicting whether any program will halt. Of course, the result I was referring to is that the set of compressible strings is of measure zero. The set of halting programs actually has finite measure, since \Omega > 0. However, there is still finite probability of an input string being nonhalting, and indeed the probability seems to be greater than 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still presents a problem for computationalism. > > The GP program (aka Universal Dovetailer) is not the simplest > > thing. The simplest describable thing is the set of all descriptions, > > that simply exist without being generated. That has precisely zero > > information or complexity, whereas the UD has some complexity of the > > order of a few 100 bits. The simplest prior is the uniform one, ie > > there is no bias whatsoever in the selection. > > Exist without being generated? Precisely zero information or > complexity? But with respect to what? With respect to the observer. With the respect to anything in fact. The set of all possible descriptions has precisely zero information, by definition, regardless of what observer or context you employ. Any traditional definition of > information/simplicity requires the specification of a formal > description > method, such as a Turing machine. You don't need one? Then how do you > define information or complexity? How exactly do you define "simple" ? > Actually, all it needs is an equivalencing procedure. See my "On Complexity and Emergence" paper. A Turing machine will do the job for you, but so will a neural network or an "expert system" following heuristic rules. Perhaps the simplest formal approach is say that the equivalencing procedure is a function from the set of descriptions onto the integers (labelling the equivalence classes). If that function is partial recursive, then the equivalencing mechanism can be replaced by the appropriate Turing machine. Computationalism is the creed that this function must be partial recursive, but it is not necessary for the definition of information or complexity. I can see some subtle problems occurring if this equivalencing procedure is stochastic in nature, however by summing over the appropriate distributions, one should still end up with meaningful results provided the variance is not too large. Cheers Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions & duplications
Schmidhuber: >>It's the simplest thing, given this use of mathematical >>language we have agreed upon. But here the power of the >>formal approach ends - unspeakable things remain unspoken. Marchal: >I disagree. I would even say that it is here that the serious formal >approach begins. Take "unprovable" for "unspeakable", we can >meta-reason (informally or formally) and study the logical structure >of the set of unprovable sentences by sound universal machines. Schmidhuber: Unprovable does not mean unspeakable! You can spell out many unprovable things. Why care for the subset of provable sentences? Aren't we interested in the full set of all describable sentences? We can generate it, without caring for proofs at all. "Unspeakable" means "beyond formal definition." In particular, the way we humans communicate and agree on a common formal language is not formally defined. You write "x" and I say, ok, it's an "x", but that's all based on some sort of informal agreement that involves complex pattern recognition and learning processes. Usually we do not question the validity of outcomes of such cognitive processes, and just go ahead with formal derivations. But there is this major informal step *before* formal reasoning can start, and good textbooks on logic acknowledge this.
Re: Predictions & duplications
> From: Juho Pennanen <[EMAIL PROTECTED]> > So there may be no 'uniform probability distribution' on the set of all > strings, but there is the natural probability measure, that is in many > cases exactly as useful. Sure, I agree, measures are useful; I'm using them all the time. But in general they are _not_ the same thing as probability distributions. > From: Russell Standish <[EMAIL PROTECTED]> > The only reason for not accepting the simplest thing is if it can be > shown to be logically inconsistent. This far, you have shown no such > thing, but rather demonstrated an enormous confusion between measure > and probability distribution. Russell, this is really too much; please read any intro to measure theory, e.g., the one by Halmos. Measures in general are _not_ probability distributions. They are more than that; you can view them as _sets_ of probability distributions on sets that are related to each other in a specific way. Probability density functions define measures and therefore in general aren't probability distributions either. The uniform PDF on [0..1] yields the perfect trivial example; that's exactly why I used it before. In the computability context, compare LI/Vitanyi 1997, chapters 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 (continuous sample space with (semi)measure M(x)). > 1) Halting theorem implies the set of halting programs is of measure > zero within the set of all programs You mean, given a uniform measure? But why should you need the halting theorem for this trivial statement? In fact, what exactly do you mean by halting theorem? (Schoenfield's book provides a good intro to mathematical logic and computability theory.) > The GP program (aka Universal Dovetailer) is not the simplest > thing. The simplest describable thing is the set of all descriptions, > that simply exist without being generated. That has precisely zero > information or complexity, whereas the UD has some complexity of the > order of a few 100 bits. The simplest prior is the uniform one, ie > there is no bias whatsoever in the selection. Exist without being generated? Precisely zero information or complexity? But with respect to what? Any traditional definition of information/simplicity requires the specification of a formal description method, such as a Turing machine. You don't need one? Then how do you define information or complexity? How exactly do you define "simple" ?
Re: Predictions & duplications
Juergen Schmidhuber wrote > Russell Standish wrote: >> I never subscribed to computationalism at any time, >> but at this stage do not reject it. I could conceive of us living in >> a stupendous virtual reality system, which is in effect what your GP >> religion Mark II is. However, as pointed out by others, it does suffer >> from "turtle-itis", and should not be considered the null >> hypothesis. It requires evidence for belief. > >By turtle-itis you mean: in which universe do the GP and his computer >reside? Or the higher-level GP2 which programmed GP? And so on? But >we cannot get rid of this type of circularity - computability and >mathematical logic are simply taken as given things, without any >further explanation, like a religion. I disagree. Comp could not have any pretension with respect to the search of a TOE without Church thesis. And we can argue for Church Thesis. (The two main and very different types of argument are: a) The empirical fact that all definitions of computable functions leads provably to the same class of functions. b) The closure of that set for the "most transcendant" operation in mathematics: diagonalisation. It is really the closure of the set of computable functions for the diagonalisation which save comp from any form of *turtle-itis*. That closure provides the fixed point for all computable and self-referencial transformations (from which in particular G and G* an G* minus G can arise). >The computable multiverse, or >the set of logically possible or mathematically possible or computable >universes, represents the simplest explanation we can write down formally. Except that I still don't know what you mean by "universe". >But what exactly does it mean to accept something as a formal statement? >What does it mean to identify the messy retina patterns caused by this >text with abstract mathematical symbols such as x and y? All formal >explanations in our formal papers assume that we agree on how to read >them. I disagree. Axiomatic has been invented and developped for being able to see the validity of our reasoning independently of our interpretations of it. Of course we bet on the ability to distinguish things, and O/1 in particular, but that's another level. >But reading and understanding papers is a complex physical and >cognitive process. So all our formal explanations are relative to this >given process which we usually do not even question. You are mixing levels of description. It is not because I take for granted that you can read english, that I will take for granted that you have a brain made of wave/particle or *any* physical description. This confirm your *use* of the GP suffers turtle-itis. I should have ask you what you mean by universe much earlier. My feeling is you are presupposing some conception of the universe. The UDA gives at least a road making the reasoning immune with respect to such preconception (with the exception of the preconception of natural numbers). >Essentially, the >GP program is the simplest thing we can write down, relative to the >unspoken assumption that it is clear what it means to write something >down, and how to process it. The miracle of Church Thesis is that the "how to process it" is defined in the arithmetical relation themselves. That is what make possible to study the possible discourse of the machines themselves, about they most probable relative computation states and stories. I guess we agree on the importance of intuition about the finitary, the finite things. >It's the simplest thing, given this use >of mathematical language we have agreed upon. But here the power of the >formal approach ends - unspeakable things remain unspoken. I disagree. I would even say that it is here that the serious formal approach begins. Take "unprovable" for "unspeakable", we can meta-reason (informally or formally) and study the logical structure of the set of unprovable sentences by sound universal machines. Although unprovable by the UTM, those sentences are anticipable. Note that "there is a universe around me" is probably such a (psycho) logical anticipable sentence, not a provable one. Bruno
Re: Predictions & duplications
[EMAIL PROTECTED] wrote: > > > From: Russell Standish <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > > > I think we got into this mess debating whether an infinite set could > > support a uniform measure. I believe I have demonstrated this. > > I've yet to see anything that disabuses me of the notion that a > > probability distribtuion is simply a measure that has been normalised > > to 1. Not all measures are even normalisable. > > Russell, at the risk of beating a dead horse: a uniform measure is _not_ a > uniform probability distribution. Why were measures invented in the first > place? To deal with infinite sets. You cannot have a uniform probability > distribution on infinitely many things. That's why measures isolate just > finitely many things, say, every bitstring of size n, and for each x of > size n look at the infinite set of strings starting with x. A uniform > measure assigns equal probability to each such set. Of course, then you > have a uniform probability distribution on those finitely many things > which are sets. But that's not a uniform probability distribution on > infinitely many things, e.g., on the bitstrings themselves! The measure > above is _not_ a probability distribution; it is an infinite _set_ of > _finite_ probability distributions, one for string size 0, one for string > size 1, one for string size 2,... > Good grief, you've missed the point entirely! Measures do measure infinite sets - consider the set [0,1] - it is an infinite set, and the uniform probaility distribution and uniform measure are the same thing. We can also consider the set [0,\infty) which has a uniform measure on it (the same constant probability distibution as before, but extended to the full set). Clearly it cannot have a uniform probability distribution, as the set has infinite measure. You seem fixated on the peculiar definition of measure on strings that you gave before, rather than the more general notion. > > I realise that the Halting theorem gives problems for believers of > > computationalism. > > It does not. Why should it? Restating what I said before simply: 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs 2) Computationalism is the assumption that the universe's observer is a universal Turing machine. 3) Uniform measure over the Plenitude of descriptions implies that observer will never see simple strings. > > > I never subscribed to computationalism at any time, > > but at this stage do not reject it. I could conceive of us living in > > a stupendous virtual reality system, which is in effect what your GP > > religion Mark II is. However, as pointed out by others, it does suffer > > from "turtle-itis", and should not be considered the null > > hypothesis. It requires evidence for belief. > > By turtle-itis you mean: in which universe do the GP and his computer > reside? Or the higher-level GP2 which programmed GP? And so on? But > we cannot get rid of this type of circularity - computability and > mathematical logic are simply taken as given things, without any > further explanation, like a religion. The computable multiverse, or > the set of logically possible or mathematically possible or computable > universes, represents the simplest explanation we can write down formally. > But what exactly does it mean to accept something as a formal statement? > What does it mean to identify the messy retina patterns caused by this > text with abstract mathematical symbols such as x and y? All formal > explanations in our formal papers assume that we agree on how to read > them. But reading and understanding papers is a complex physical and > cognitive process. So all our formal explanations are relative to this > given process which we usually do not even question. Essentially, the > GP program is the simplest thing we can write down, relative to the > unspoken assumption that it is clear what it means to write something > down, and how to process it. It's the simplest thing, given this use > of mathematical language we have agreed upon. But here the power of the > formal approach ends - unspeakable things remain unspoken. > The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. The set of all descriptions and the uniform measure do not suffer from turtle-itis. However, they are susceptible to criticism of the Church-Turing thesis - namely why should our idea of i
Re: Predictions & duplications
juergen wrote: > Russell, at the risk of beating a dead horse: a uniform measure is _not_ a > uniform probability distribution. Why were measures invented in the first > place? To deal with infinite sets. You cannot have a uniform probability > distribution on infinitely many things. The last sentence is trivially true when 'probability distributions' are defined as probability measures on discrete sets. But someone could also think this is just irrelevant word-play on definitions. There are uniform probability (density) functions on infinite sets, e.g. the uniform distribution on the interval [0,1], which gives the measure (probability) t for each interval [x,x+t]. Obviously, this distribution gives measure 0 for any singleton containing only one real number. Still it's uniform, though not a 'probability distribution' if one wishes to use the most restricted definition. Similarly, there is the natural probability measure on the set of all infinite strings of 0's and 1's. It is 'uniform' in the sense that no string or place in a string is in a priviledged position. To define it, set n_0 as the set of all strings that have a 0 at the n'th place and n_1 as the set of all strings that have a 1 at the n'th place, for any n. Then set m(n_0)=m(n_1)=1/2, for all n. Using the standard definition of measure that has been cited on this list (Kolmogorov axioms), m has a unique extension which is a measure on the set of all strings. So there may be no 'uniform probability distribution' on the set of all strings, but there is the natural probability measure, that is in many cases exactly as useful. Juho
Re: Predictions & duplications
> From: Russell Standish <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > > I think we got into this mess debating whether an infinite set could > support a uniform measure. I believe I have demonstrated this. > I've yet to see anything that disabuses me of the notion that a > probability distribtuion is simply a measure that has been normalised > to 1. Not all measures are even normalisable. Russell, at the risk of beating a dead horse: a uniform measure is _not_ a uniform probability distribution. Why were measures invented in the first place? To deal with infinite sets. You cannot have a uniform probability distribution on infinitely many things. That's why measures isolate just finitely many things, say, every bitstring of size n, and for each x of size n look at the infinite set of strings starting with x. A uniform measure assigns equal probability to each such set. Of course, then you have a uniform probability distribution on those finitely many things which are sets. But that's not a uniform probability distribution on infinitely many things, e.g., on the bitstrings themselves! The measure above is _not_ a probability distribution; it is an infinite _set_ of _finite_ probability distributions, one for string size 0, one for string size 1, one for string size 2,... > I realise that the Halting theorem gives problems for believers of > computationalism. It does not. Why should it? > I never subscribed to computationalism at any time, > but at this stage do not reject it. I could conceive of us living in > a stupendous virtual reality system, which is in effect what your GP > religion Mark II is. However, as pointed out by others, it does suffer > from "turtle-itis", and should not be considered the null > hypothesis. It requires evidence for belief. By turtle-itis you mean: in which universe do the GP and his computer reside? Or the higher-level GP2 which programmed GP? And so on? But we cannot get rid of this type of circularity - computability and mathematical logic are simply taken as given things, without any further explanation, like a religion. The computable multiverse, or the set of logically possible or mathematically possible or computable universes, represents the simplest explanation we can write down formally. But what exactly does it mean to accept something as a formal statement? What does it mean to identify the messy retina patterns caused by this text with abstract mathematical symbols such as x and y? All formal explanations in our formal papers assume that we agree on how to read them. But reading and understanding papers is a complex physical and cognitive process. So all our formal explanations are relative to this given process which we usually do not even question. Essentially, the GP program is the simplest thing we can write down, relative to the unspoken assumption that it is clear what it means to write something down, and how to process it. It's the simplest thing, given this use of mathematical language we have agreed upon. But here the power of the formal approach ends - unspeakable things remain unspoken. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions & duplications
> From: [EMAIL PROTECTED]: > [EMAIL PROTECTED] wrote: > > > From [EMAIL PROTECTED]: > > > [EMAIL PROTECTED] wrote: > > > > M measure: > > > > M(empty string)=1 > > > > M(x) = M(x0)+M(x1) nonnegative for all finite x. > > > > > > This sounds more like a probability distribution than a measure. In > > > the set of all descriptions, we only consider infinite length > > > bitstrings. Finite length bitstrings are not members. However, we can > > > identify finite length bitstrings with subsets of descriptions. The > > > empty string corresponds to the full set of all descriptions, so the > > > first line M(empty string)=1 implies that the measure is normalisable > > > (ie a probability distribution). > > > > Please check out definitions of measure and distribution! > > Normalisability is not the critical issue. > > > > Clearly: Sum_x M(x) is infinite. So M is not a probability > > distribution. M(x) is just measure of all strings starting with x: > > M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = > > For a measure to be normalisable, the sum over a *disjoint* set of > subsets must be finite. If the set of subsets is not disjoint (ie the > intersection are not empty) then the sum may well be infinite. > Bringing this to the case of finite strings. Each finite string is > actually a subset of the set of infinite strings, each containing the > same finite prefix. So the string 101010 is actually a subset of 10101 > and so on. The Sum_x M(x), where I assume x ranges over all strings > will of course be infinite. However, since the set of finite strings > is not disjoint, this doesn't imply M(x) is unnormalisable. > Now when you realise that every finite string x is a subset of the empty > string, it becomes clear that M(x) is normalised to precisely 1. The point is: prob dists and measures are different things. There is a good reason for giving them different names. Prob dists assign numbers to individual objects, not to sets. Traditional definitions as well as those for semimeasures in http://www.idsia.ch/~juergen/toesv2/ > > Neglecting finite universes means loss of generality though. > > Hence measures mu(x) in the ATOE paper do not neglect finite x: > > > > mu(empty string)=1 > > mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). > > > > And here P is a probability distribution indeed! > > P(x)>0 possible only for x with finite description. > > > > The P(x)>0 case above actually breaks the countably subadditive > property, so mu(x) cannot be called a measure... I'm not entirely sure > what you're getting at here. The set of computable universes includes the finite ones which we may not ignore. How do they contribute to the measure of all universes? For convenience introduce a third symbol $ besides 0 and 1. Now what is the measure of the set of all universes starting with 00110? It's the sum of the measure of the set of universes starting with 001101 and the measure of the set of universes starting with 001100 and the measure of the single universe 00110$ that stops right after 00110. Once there is a $ symbol there won't be another 0 or 1. So mu(x) = P(x)+mu(x0)+mu(x1). Our favorite example is the universal Turing machine with random inputs. For finite x: P(x) is the probability that the TM output is x and nothing but x. mu(x) is something different, namely, the so-called universal measure of all strings _starting_ with x. Again mu(x) = P(x)+mu(x0)+mu(x1). Juergen Schmidhuberhttp://www.idsia.ch/~juergen/
Re: Predictions & duplications
[EMAIL PROTECTED] wrote: > > > > > From [EMAIL PROTECTED]: > > [EMAIL PROTECTED] wrote: > > > M measure: > > > M(empty string)=1 > > > M(x) = M(x0)+M(x1) nonnegative for all finite x. > > > > This sounds more like a probability distribution than a measure. In > > the set of all descriptions, we only consider infinite length > > bitstrings. Finite length bitstrings are not members. However, we can > > identify finite length bitstrings with subsets of descriptions. The > > empty string corresponds to the full set of all descriptions, so the > > first line M(empty string)=1 implies that the measure is normalisable > > (ie a probability distribution). > > > Please check out definitions of measure and distribution! > Normalisability is not the critical issue. > > Clearly: Sum_x M(x) is infinite. So M is not a probability > distribution. M(x) is just measure of all strings starting with x: > M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = For a measure to be normalisable, the sum over a *disjoint* set of subsets must be finite. If the set of subsets is not disjoint (ie the intersection are not empty) then the sum may well be infinite. Bringing this to the case of finite strings. Each finite string is actually a subset of the set of infinite strings, each containing the same finite prefix. So the string 101010 is actually a subset of 10101 and so on. The Sum_x M(x), where I assume x ranges over all strings will of course be infinite. However, since the set of finite strings is not disjoint, this doesn't imply M(x) is unnormalisable. Now when you realise that every finite string x is a subset of the empty string, it becomes clear that M(x) is normalised to precisely 1. > > Neglecting finite universes means loss of generality though. > Hence measures mu(x) in the ATOE paper do not neglect finite x: > > mu(empty string)=1 > mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). > > And here P is a probability distribution indeed! > P(x)>0 possible only for x with finite description. > The P(x)>0 case above actually breaks the countably subadditive property, so mu(x) cannot be called a measure... I'm not entirely sure what you're getting at here. Cheers > Juergen Schmidhuber > > http://www.idsia.ch/~juergen/ > http://www.idsia.ch/~juergen/everything/html.html > http://www.idsia.ch/~juergen/toesv2/ > > > Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions & duplications
> From [EMAIL PROTECTED]: > [EMAIL PROTECTED] wrote: > > M measure: > > M(empty string)=1 > > M(x) = M(x0)+M(x1) nonnegative for all finite x. > > This sounds more like a probability distribution than a measure. In > the set of all descriptions, we only consider infinite length > bitstrings. Finite length bitstrings are not members. However, we can > identify finite length bitstrings with subsets of descriptions. The > empty string corresponds to the full set of all descriptions, so the > first line M(empty string)=1 implies that the measure is normalisable > (ie a probability distribution). Please check out definitions of measure and distribution! Normalisability is not the critical issue. Clearly: Sum_x M(x) is infinite. So M is not a probability distribution. M(x) is just measure of all strings starting with x: M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = Neglecting finite universes means loss of generality though. Hence measures mu(x) in the ATOE paper do not neglect finite x: mu(empty string)=1 mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). And here P is a probability distribution indeed! P(x)>0 possible only for x with finite description. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions & duplications
Juergen wrote (on 12th Oct): > . . . In most possible futures your computer will > vanish within the next second. But it does not. This indicates that our > future is _not_ sampled from a uniform prior. I don't wish to comment directly on the computer-vanishing problem as it applies to Juergen's scheme (my own problem with this laudable scheme is that it appears to be vulnerable to the same 'turtle-itis' criticism as for all theistic religions - the (literal or abstract GP) 'turtle' responsible for the world seems to need another turtle to support it and so on - there is no full explanation), but I would like to say that certain other proposed solutions don't suffer from this computer-vanishing problem (also known as the WR/dragon problem), if one thinks of infinite length bit strings / formal system descriptions via Limit n -> infinity, where n is the relevant string/description length (see appendix below). It seems to me that in thinking in simple infinity terms one can lose essential information (for example integers cannot be determined to be more numerous than the odd numbers - both are of the lowest order of infinity - not a problem for limit analysis). Alastair Malcolm APPENDIX One might naively think that as there are at least hundreds of possible states (call them V1, V2... ) where some part of our computer clearly vanishes (and only one (N), where normality prevails), then even if one considers bit string or other types of formal description involving other variations in our universe or indeed other universes, one could still 'divide through' to find that we are most likely to be in a universe where our computer vanishes in whole or part. However, we note that in any minimal description of our universe (and the following argument does not depend on there having to *only* be a minimal description), deviations from the actual physical laws causing V1, V2... will involve additional ad-hoc rules/events, so we can say that in complexity terms (whether as a bit string or formal system description) we will have V1 = N + VE1, V2 = N + VE2 etc, where VE1, VE2 etc are the extra segments of description required to cater for the part-vanishings (strictly: Length(V1) = Length(N) + Length(VE1) etc, but hopefully this is clear). Moreover, if we are considering all possible descriptions, we also have to allow for extra descriptions corresponding to entities beyond our observability. For each V = N + VE we will have many DC = N + DCE, where DC is a 'don't care' description - it is a universe (or set of universes) indistinguishable by us from N (our own, non-computer-vanishing one), yet objectively different. Now, the key point is that in any objective descriptive framework (whether by bit string or formal system), one should take the Limit as the description length increases to infinity, not as our (humanly biassed) visible universe is progressively and unobservably added to (say by other universes). As we do this, we are far more likely to be in a DC (= N + DCE) universe than a V (= N + VE) universe: computers don't normally vanish, in whole or in part. More details at: http://www.physica.freeserve.co.uk/p105.htm linking to: http://www.physica.freeserve.co.uk/p111.htm http://www.physica.freeserve.co.uk/p112.htm
Re: Predictions & duplications
[EMAIL PROTECTED] wrote: > > > Confusion about what's a measure? > What's a distribution? > Simple but important! > For bitstrings x: > > M measure: > M(empty string)=1 > M(x) = M(x0)+M(x1) nonnegative for all finite x. This sounds more like a probability distribution than a measure. In the set of all descriptions, we only consider infinite length bitstrings. Finite length bitstrings are not members. However, we can identify finite length bitstrings with subsets of descriptions. The empty string corresponds to the full set of all descriptions, so the first line M(empty string)=1 implies that the measure is normalisable (ie a probability distribution). As I said before, non-normalisable measures are considered measures by the mathematical community - and the uniform measure over the set of all descriptions is well defined. > > P probability distribution: > Sum_x P(x) = 1; P(x) nonnegative > > --- > > M semimeasure - replace "=" by ">=": > M(x) >= M(x0)+M(x1) > > P semidistribution - replace: > Sum_x P(x) <= 1 > > --- > > Examples: > > 1. Distribution: E.g., integers n: P(n) = 6/(Pi n^2) > 2. Semidistribution: m(x) = probability of guessing a halting program >for x (BTW, Hal, this was first published by Levin in 1974, not by >Chaitin in 1975) > 3. Measure: E.g., each x of size n gets weight 2^-n > 4. Semimeasure: E.g., mu^M(x) = probability of guessing a halting or >nonhalting monotone TM program whose output starts with x > > Check out: Measures and Probability Distributions > Section 4 of "Algorithmic TOEs" > http://www.idsia.ch/~juergen/toesv2/node15.html > > > > Juergen Schmidhuber > > http://www.idsia.ch/~juergen/ > http://www.idsia.ch/~juergen/everything/html.html > http://www.idsia.ch/~juergen/toesv2/ > > Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions & duplications
Confusion about what's a measure? What's a distribution? Simple but important! For bitstrings x: M measure: M(empty string)=1 M(x) = M(x0)+M(x1) nonnegative for all finite x. P probability distribution: Sum_x P(x) = 1; P(x) nonnegative --- M semimeasure - replace "=" by ">=": M(x) >= M(x0)+M(x1) P semidistribution - replace: Sum_x P(x) <= 1 --- Examples: 1. Distribution: E.g., integers n: P(n) = 6/(Pi n^2) 2. Semidistribution: m(x) = probability of guessing a halting program for x (BTW, Hal, this was first published by Levin in 1974, not by Chaitin in 1975) 3. Measure: E.g., each x of size n gets weight 2^-n 4. Semimeasure: E.g., mu^M(x) = probability of guessing a halting or nonhalting monotone TM program whose output starts with x Check out: Measures and Probability Distributions Section 4 of "Algorithmic TOEs" http://www.idsia.ch/~juergen/toesv2/node15.html Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions & duplications
Juergen Schmidhuber wrote: >Bruno, there are so many misleading or unclear statements >in your post - I do not even know where to start. Rhetorical tricks, if not insult as usual :( Have you read http://www.escribe.com/science/theory/m3241.html Well, G* tells me to remain mute here, but then I don't care. >[...] >You can be a GP. Write a program that computes all universes >computable in the limit. Add storage in case of storage overflow. >Still, you as a GP have a resource problem. At some point it will >get harder and harder for you to keep computing all the things >you'd like to compute. Are you using the GP as an analogy or what? What is exactly your ontology? You have not answered my question: How could the GP have resource problems? Is not resource a notion internal to some universe, and are not universe generated by the GP. You are presupposing physics. >Just like I cannot say much about the environment of the particular >GP who programmed us. Are you really serious? Are you believing we have a GP in a universe? Where does the universe of our GP comes from? This is tortoise build on tortoises. Even infinite tortoises sequences embedded in a physical universe. That explains nothing. I knew you have not yet understand the comp reversal physico/psycho, but you have not even seen the reversal physico/computer-science, or what? >Again: forget all the esoteric undecidability stuff, which does not >really matter here, and be pragmatic and try to look at things from the >perspective of some observer being computed as part of a virtual reality >that you programmed. The observer may get smart and correctly suspect >you have a resource problem, and can make nontrivial predictions from >there, using the Speed Prior. You are still postulating what I am searching an explanation for. I am not (necessarily) pragmatic in advance. A concrete virtual reality on earth inherits the "normality" of the world in which it is embedded. I pretend that comp force us to justify that normality. It is not obvious that that is possible. But the UDA forces to take all computations into account, and undecidability stuff is unavoidable, if only because we can only *bet* on our consistent extentions (In the translation of the UDA into arithmetic, consistent extension have the unprovable type "<>p"). Incompleteness phenomena put fundamental constraints for a TOE. You could call them logical and arithmetical priors. But you don't seem searching a toe, just a recipe for predicting some granted neighborhood. The arithmetical UD is just the set of \Sigma 1 sentences. Resource must be explained from it, not the inverse. I am not sure a TOE can be pragmatic. Like quantum mechanics is not useful for cooking. >No, that's exactly where the Speed Prior comes in. Accoring to this prior, >the slow programs computing the same thing contribute less to its >probability mass. It is explained in the chapter on >Speed Prior-Based Inductive Inference >http://www.idsia.ch/~juergen/toesv2/node32.html I don't need to read it because it is in contradiction with the "invariance lemma", in particular by the fact that from a first person point of view the delays (between the many acces by the UD of near computational states) does not matter. (BTW I have read it, and we discussed it before). You know the GP will generate itself, but will generate also less and less efficient version of itself. From the first person point of view they are indistinguishable. Are you telling me that the Juergen and Bruno appearing in those very slow GPs are zombie? >> It is indeed hard to write a little program which generate a long >> Kolmogorov-Chaitin incompressible 01 string. >> But, as I told you earlier, it is quite easy to write a little >> program which generates them all. (It is the essence of the >> everything idea imo). > >As you told me earlier? Huh? >That's one of the points of the 1997 book chapter. I was meaning "As I recall you earlier in some recent post"! (but look at my 1988 Toulouse paper ;-) >The thing is: when you generate them all, and assume that all are >equally likely, in the sense that all beginnings of all strings are >uniformly distributed, then you cannot explain why the regular universes >keep being regular. The random futures are then just as likely as the >nonrandom ones. > >So you NEED something additional to explain the ongoing regularity. >You need something like the Speed Prior, which greatly favors regular >futures over others. You speak like if I were proposing the iterated duplication as a universal explanation. It is only an illustration of the comp indeterminism. With comp it is the whole work of the UD, UD*, which constitutes the domain of indeterminism. The self-multiplication experience just show how a 3 deterministic process consistent with comp explains a phenomenology of (in the limit) sort of absolute randomnes. The "real" experience is the infinite running of the UD. And that
Re: Predictions & duplications
Hal - that is not a uniform measure! [EMAIL PROTECTED] wrote: > > Juergen Schmidhuber writes: > > But there is no uniform prior over all programs! > > Just like there is no uniform prior over the integers. > > To see this, just try to write one down. > > I think there is. Given a program of length l, the prior probability > is 2^(-l). (That is 2 to the power of negative l.) The length of a > program is defined by interpreting it using self-delimiting rules as > is customary in the AIT analysis of Greg Chaitin. > > Hal Finney > Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions & duplications
Hal Finney wrote: > Juergen Schmidhuber writes: > > But there is no uniform prior over all programs! > > Just like there is no uniform prior over the integers. > > To see this, just try to write one down. > > I think there is. Given a program of length l, the prior probability > is 2^(-l). (That is 2 to the power of negative l.) The length of a > program is defined by interpreting it using self-delimiting rules as > is customary in the AIT analysis of Greg Chaitin. This doesn't seem to be very uniform to me. Maybe you mean that with the above prior the probability for a bit, drawn randomly from the set of all programs, to be 1 is 1/2 ? Saibal Saibal
Re: Predictions & duplications
Saibal wrote: > Hal Finney wrote: > > Juergen Schmidhuber writes: > > > But there is no uniform prior over all programs! > > > Just like there is no uniform prior over the integers. > > > To see this, just try to write one down. > > > > I think there is. Given a program of length l, the prior probability > > is 2^(-l). (That is 2 to the power of negative l.) The length of a > > program is defined by interpreting it using self-delimiting rules as > > is customary in the AIT analysis of Greg Chaitin. > > This doesn't seem to be very uniform to me. Maybe you mean that with the > above prior the probability for a bit, drawn randomly from the set of all > programs, to be 1 is 1/2 ? I should have used the proper name for this, the universal prior. However it is derived in the limit (of string length -> infinity) by giving the same probability to all strings, and then adopting the convention that we ignore the material past the end of a self-delimiting program. As we go to the limit, if we are looking at n bit strings, then a self delimiting program of length l will have 2^(n-l) instances among the 2^n possible strings, hence will have measure 2^(-l), which is the universal prior. So the universal prior on self-delimiting programs is the same as the uniform prior on all program strings, as we take the limit in program string length going to infinity. Hal
Re: Predictions & duplications
Juergen Schmidhuber writes: > But there is no uniform prior over all programs! > Just like there is no uniform prior over the integers. > To see this, just try to write one down. I think there is. Given a program of length l, the prior probability is 2^(-l). (That is 2 to the power of negative l.) The length of a program is defined by interpreting it using self-delimiting rules as is customary in the AIT analysis of Greg Chaitin. Hal Finney
Re: Predictions & duplications
Hal Finney wrote: >Isn't this fixed by saying that the uniform measure is not over all >universe histories, as you have it above, but over all programs that >generate universes? Now we have the advantage that short programs >generate more regular universes than long ones, and the WAP grows teeth. I agree with Juergen on this: there is no uniform measure over all programs (nor on all integers). But in anycase you know I argue, unlike Juergen, that the measure should be put on all (relative) consistent computational extensions appearing in UD*. This gives indeed a sort of Weak Turing-tropic Principle. In which relative *speed* is a by product, not a prior. Bruno
Re: Predictions & duplications
Saibal Mitra wrote: >John Mikes wrote: >`` If you say: a sequence defying all >rules, then it is not random, it is calculable. You have to consider all >rules and cut them out.´´ > >If you try to do that then you encounter the famous halting problem. Exactly. So why not defined random by incompressible by any (universal) machine (Kolmogorov incompressibility). In similar sense you can prove that "most" finite and infinite sequence are incompressible. Here Church Thesis gives some "absolute" epistemological meaning of that notion of randomnes. Cf Li and Vitanyi Bruno
Re: Predictions & duplications
Juergen writes > But there is no uniform prior over all programs! > Just like there is no uniform prior over the integers. > To see this, just try to write one down. This is of course true (if uniform measure is a measure that gives the same, non-zero, probability for each program. I got no idea what's the official definition). OTOH, There's of course the natural product measure over the set of all infinite strings. In my last post I tried to show that this is enough. I admit now I was cutting much too many corners. It is not even obvious that the set of all programs that produce a specific conscious experience is measurable. And even if it was, it now seems obvious to me that random programs produce random worlds. So some kind of 'resource limitation' must exist. One option would be to accept only programs that are shorter than some maximum length. This corresponds to only accepting worlds that have complexity lower than some maximum. But this is one type of 'resource limitation' and in fact a very unelegant one. Juho
Re: Predictions & duplications
But there is no uniform prior over all programs! Just like there is no uniform prior over the integers. To see this, just try to write one down. BTW, it's not Solomon-Levy but Solomonoff-Levin. And it has nothing to do with resource bounds! Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/ > From [EMAIL PROTECTED] Fri Oct 12 19:22:02 2001 > > Juergen writes: > > Some seem to think that the weak anthropic principle explains the > > regularity. The argument goes like this: "Let there be a uniform measure > > on all universe histories, represented as bitstrings. Now take the tiny > > subset of histories in which you appear. Although the measure of this > > subset is tiny, its conditional measure, given your very existence, > > is not: According to the weak anthropic principle, the conditional > > probability of finding yourself in a regular universe compatible with > > your existence equals 1." > > > > But it is essential to see that the weak anthropic principle does not > > have any predictive power at all. It does not tell you anything about > > the future. It cannot explain away futures in which you still exist > > but irregular things happen. Only a nonuniform prior can explain this. > > Isn't this fixed by saying that the uniform measure is not over all > universe histories, as you have it above, but over all programs that > generate universes? Now we have the advantage that short programs > generate more regular universes than long ones, and the WAP grows teeth. > From: Russell Standish <[EMAIL PROTECTED]> > > That is almost the correct solution, Hal. If we ask what an observer > will make of a random description chosen at random, then you get > regular universes with probability exponentially related to the > inferred complexity. It is far clearer to see what happen when the > observer is a UTM, forcibly terminating programs after a > certain number of steps (representing the observer's resource bound) > (thus all descriptions are halting programs). Then one obtains a > Solomon-Levy distribution or universal prior. However, this argument > also works when the observer is not a UTM, but simply a classification > device of some kind.
Re: Predictions & duplications
"According to whim or taste" implies a conscious entity performing choices according to a free will. This need not be the case. In my mind, random means selected without cause (or without procedure/algorithm). A lot has been written on randomness, and its problematic nature. I don't for one minute suggest I have anything new to say on the matter. Cheers jamikes wrote: > > Dear Russell and Hal, > this is a naive question, however it goes into the basics of your written > explanations. How would YOU define "random"? I had this problem for a long > long time and never got a satisfactory solution. In my (non Indo-European) > language, Hungarian, there is no exact word for it, it is used as a term > meaning "according to whim or taste" (tetszöleges) - which leaves open that > I may not LIKE the 'random' in question. If you say: a sequence defying all > rules, then it is not random, it is calculable. You have to consider all > rules and cut them out. Is a series of 1... random? of course it may be. > Is the pi-decimal random? no, because it is a result of calculation. If one > says: "just pick it" that would follow all circumstantial pressure of the > situation, maybe unconsciously, yet defying the 'random'. > > So what is the content you use behind that word? > > I would be surprised if both of you (and others, too) would agree . > Till then I wish you luck to use the word - at random. > > Best wishes > John Mikes > - Original Message - > From: "Russell Standish" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Sunday, October 14, 2001 4:36 AM > Subject: Re: Predictions & duplications > >>>>>>>SNIP<<<<<<<<< > > > > That is almost the correct solution, Hal. If we ask what an observer > > will make of a random description chosen at random, then you get > > regular universes with probability exponentially related to the > > inferred complexity. It is far clearer to see what happen when the > > observer is a UTM, forcibly terminating programs after a > > certain number of steps (representing the observer's resource bound) > > (thus all descriptions are halting programs). Then one obtains a > > Solomon-Levy distribution or universal prior. However, this argument > > also works when the observer is not a UTM, but simply a classification > > device of some kind. > > > > The WAP has nothing to do with this issue, except inasmuch as > > universes can only be observed through the eyes of some observer. > > > > Again I reiterate that Juergen's resource-bounded "Great Programmer" > > religion need be nothing but a reflection of our conscious selves stamped > > upon our observations. > > > > Cheers > > > > [EMAIL PROTECTED] wrote: > > > > > > Juergen writes: > > > > Some seem to think that the weak anthropic principle explains the > > > > regularity. The argument goes like this: "Let there be a uniform > measure > > > > on all universe histories, represented as bitstrings. Now take the > tiny > > > > subset of histories in which you appear. Although the measure of this > > > > subset is tiny, its conditional measure, given your very existence, > > > > is not: According to the weak anthropic principle, the conditional > > > > probability of finding yourself in a regular universe compatible with > > > > your existence equals 1." > > > > > > > > But it is essential to see that the weak anthropic principle does not > > > > have any predictive power at all. It does not tell you anything about > > > > the future. It cannot explain away futures in which you still exist > > > > but irregular things happen. Only a nonuniform prior can explain this. > > > > > > Isn't this fixed by saying that the uniform measure is not over all > > > universe histories, as you have it above, but over all programs that > > > generate universes? Now we have the advantage that short programs > > > generate more regular universes than long ones, and the WAP grows teeth. > > > > > > Hal Finney > > > > > > > > > > > -- > -- > > Dr. Russell StandishDirector > > High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 > (mobile) > > UNSW
Re: Predictions & duplications
John Mikes wrote: `` If you say: a sequence defying all rules, then it is not random, it is calculable. You have to consider all rules and cut them out.´´ If you try to do that then you encounter the famous halting problem. Saibal
Re: Predictions & duplications
Dear Russell and Hal, this is a naive question, however it goes into the basics of your written explanations. How would YOU define "random"? I had this problem for a long long time and never got a satisfactory solution. In my (non Indo-European) language, Hungarian, there is no exact word for it, it is used as a term meaning "according to whim or taste" (tetszöleges) - which leaves open that I may not LIKE the 'random' in question. If you say: a sequence defying all rules, then it is not random, it is calculable. You have to consider all rules and cut them out. Is a series of 1... random? of course it may be. Is the pi-decimal random? no, because it is a result of calculation. If one says: "just pick it" that would follow all circumstantial pressure of the situation, maybe unconsciously, yet defying the 'random'. So what is the content you use behind that word? I would be surprised if both of you (and others, too) would agree . Till then I wish you luck to use the word - at random. Best wishes John Mikes - Original Message - From: "Russell Standish" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Sunday, October 14, 2001 4:36 AM Subject: Re: Predictions & duplications >>>>>>>SNIP<<<<<<<<< > That is almost the correct solution, Hal. If we ask what an observer > will make of a random description chosen at random, then you get > regular universes with probability exponentially related to the > inferred complexity. It is far clearer to see what happen when the > observer is a UTM, forcibly terminating programs after a > certain number of steps (representing the observer's resource bound) > (thus all descriptions are halting programs). Then one obtains a > Solomon-Levy distribution or universal prior. However, this argument > also works when the observer is not a UTM, but simply a classification > device of some kind. > > The WAP has nothing to do with this issue, except inasmuch as > universes can only be observed through the eyes of some observer. > > Again I reiterate that Juergen's resource-bounded "Great Programmer" > religion need be nothing but a reflection of our conscious selves stamped > upon our observations. > > Cheers > > [EMAIL PROTECTED] wrote: > > > > Juergen writes: > > > Some seem to think that the weak anthropic principle explains the > > > regularity. The argument goes like this: "Let there be a uniform measure > > > on all universe histories, represented as bitstrings. Now take the tiny > > > subset of histories in which you appear. Although the measure of this > > > subset is tiny, its conditional measure, given your very existence, > > > is not: According to the weak anthropic principle, the conditional > > > probability of finding yourself in a regular universe compatible with > > > your existence equals 1." > > > > > > But it is essential to see that the weak anthropic principle does not > > > have any predictive power at all. It does not tell you anything about > > > the future. It cannot explain away futures in which you still exist > > > but irregular things happen. Only a nonuniform prior can explain this. > > > > Isn't this fixed by saying that the uniform measure is not over all > > universe histories, as you have it above, but over all programs that > > generate universes? Now we have the advantage that short programs > > generate more regular universes than long ones, and the WAP grows teeth. > > > > Hal Finney > > > > > > -- -- > Dr. Russell StandishDirector > High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) > UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") > Australia[EMAIL PROTECTED] > Room 2075, Red Centre http://parallel.hpc.unsw.edu.au/rks > International prefix +612, Interstate prefix 02 > -- -- >
Re: Predictions & duplications
That is almost the correct solution, Hal. If we ask what an observer will make of a random description chosen at random, then you get regular universes with probability exponentially related to the inferred complexity. It is far clearer to see what happen when the observer is a UTM, forcibly terminating programs after a certain number of steps (representing the observer's resource bound) (thus all descriptions are halting programs). Then one obtains a Solomon-Levy distribution or universal prior. However, this argument also works when the observer is not a UTM, but simply a classification device of some kind. The WAP has nothing to do with this issue, except inasmuch as universes can only be observed through the eyes of some observer. Again I reiterate that Juergen's resource-bounded "Great Programmer" religion need be nothing but a reflection of our conscious selves stamped upon our observations. Cheers [EMAIL PROTECTED] wrote: > > Juergen writes: > > Some seem to think that the weak anthropic principle explains the > > regularity. The argument goes like this: "Let there be a uniform measure > > on all universe histories, represented as bitstrings. Now take the tiny > > subset of histories in which you appear. Although the measure of this > > subset is tiny, its conditional measure, given your very existence, > > is not: According to the weak anthropic principle, the conditional > > probability of finding yourself in a regular universe compatible with > > your existence equals 1." > > > > But it is essential to see that the weak anthropic principle does not > > have any predictive power at all. It does not tell you anything about > > the future. It cannot explain away futures in which you still exist > > but irregular things happen. Only a nonuniform prior can explain this. > > Isn't this fixed by saying that the uniform measure is not over all > universe histories, as you have it above, but over all programs that > generate universes? Now we have the advantage that short programs > generate more regular universes than long ones, and the WAP grows teeth. > > Hal Finney > Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions & duplications
Juergen writes: > Some seem to think that the weak anthropic principle explains the > regularity. The argument goes like this: "Let there be a uniform measure > on all universe histories, represented as bitstrings. Now take the tiny > subset of histories in which you appear. Although the measure of this > subset is tiny, its conditional measure, given your very existence, > is not: According to the weak anthropic principle, the conditional > probability of finding yourself in a regular universe compatible with > your existence equals 1." > > But it is essential to see that the weak anthropic principle does not > have any predictive power at all. It does not tell you anything about > the future. It cannot explain away futures in which you still exist > but irregular things happen. Only a nonuniform prior can explain this. Isn't this fixed by saying that the uniform measure is not over all universe histories, as you have it above, but over all programs that generate universes? Now we have the advantage that short programs generate more regular universes than long ones, and the WAP grows teeth. Hal Finney
Re: Predictions & duplications
In reply to Russell Standish and Juho Pennanen I'd just like to emphasize the main point, which is really trivial: by definition, a uniform measure on the possible futures makes all future beginnings of a given size equally likely. Then regular futures clearly are not any more likely than the irregular ones. I have no idea what makes Russell think this is debatable. In most possible futures your computer will vanish within the next second. But it does not. This indicates that our future is _not_ sampled from a uniform prior. Some seem to think that the weak anthropic principle explains the regularity. The argument goes like this: "Let there be a uniform measure on all universe histories, represented as bitstrings. Now take the tiny subset of histories in which you appear. Although the measure of this subset is tiny, its conditional measure, given your very existence, is not: According to the weak anthropic principle, the conditional probability of finding yourself in a regular universe compatible with your existence equals 1." But it is essential to see that the weak anthropic principle does not have any predictive power at all. It does not tell you anything about the future. It cannot explain away futures in which you still exist but irregular things happen. Only a nonuniform prior can explain this. Which nonuniform prior is plausible? My favorite is the "resource-optimal" Speed Prior. I am hoping and expecting that someone will confirm it soon by finding a rather fast pseudorandom generator responsible for apparently noisy events in our universe. But others may be more conservative and go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe even for the nonenumerable prior mu^E (which dominates mu^M while still being computable in the limit) or maybe even for the extreme prior mu^G. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/ > From [EMAIL PROTECTED] Thu Oct 11 17:36:41 2001 > From: Juho Pennanen <[EMAIL PROTECTED]> > > > > I tried to understand the problem that doctors Schmidhuber > and Standish are discussing by describing it in the most > concrete terms I could, below. (I admit beforehand I couldn't > follow all the details and do not know all the papers and > theorems referred to, so this could be irrelevant.) > > So, say you are going to drop a pencil from your hand and > trying to predict if it's going to fall down or up this > time. Using what I understand with comp TOE, I would take > the set of all programs that at some state implement a > certain conscious state, namely the state in which you > remember starting your experiment of dropping the pencil, > and have already recorded the end result (I abreviate this > conscious state with CS. To be exact it is a set of states, > but that shouldn't make a difference). > > The space of all programs would be the set of all programs > in some language, coded as infinite numerable sequences of > 0's and 1's. (I do not know how much the chosen language + > coding has effect on the whole thing). > > Now for your prediction you need to divide the > implementations of CS into two sets: those in which the > pencil fell down and those in which it fell up. Then you > compare the measures of those sets. (You would need to > assume that each program is run just once or something of > the sort. Some programs obviously implement CS several > times when they run. So you would maybe just include those > programs that implement CS infinitely many times, and > weight them with the density of CS occurrences during > their running.) > > One way to derive the measure you need is to assume a > measure on the set of all infinite sequences (i.e. on > all programs). For this we have the natural measure, > i.e. the product measure of the uniform measure on > the set containing 0 and 1. And as far as my intuition > goes, this measure would lead to the empirically correct > prediction on the direction of the pencil falling. And > if I understood it right, this is not too far from what > Dr. Standish was claiming? And we wouldn't need any > speed priors. > > But maybe the need of speed prior would come to play if I > thought more carefully about the detailed assumptions > involved? E.g. that each program would be run just once, > with the same speed etc? I am not sure. > > Juho > > / > Juho Pennanen > Department of Forest Ecology, P.O.Box 24 > FIN-00014 University of Helsinki > tel. (09)191 58144 (+358-9-191 58144) > GSM 040 5455 845 (+358-40-5455 845) > http://www.helsinki.fi/people/juho.pennanen > */ > > Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700 > From: Russell Standish <[EMAIL PROTECTED]> > > [EMAIL PROTECTED] wrote: > > > > > > > > Huh? A PDF? You mean a probability density function? On a continuous
Re: Predictions & duplications
[EMAIL PROTECTED] wrote: > > > > Huh? A PDF? You mean a probability density function? On a continuous set? Probability Distribution Function. And PDF's are defined on all measurable sets, not just continuous ones. > No! I am talking about probability distributions on describable objects. > On things you can program. Sure... > > Anyway, you write "...observer will expect to see regularities even with > the uniform prior" but that clearly cannot be true. > This is where I beg to differ. > > > > Since you've obviously barked up the wrong tree here, it's a little > > hard to know where to start. Once you understand that each observer > > must equivalence an infinite number of descriptions due to the > > boundedness of its resources, it becomes fairly obvious that the > > smaller, simpler descriptions correspond to larger equivalence classes > > (hence higher probability). > > Maybe you should write down formally what you mean? Which resource bounds? > On which machine? What exactly do you mean by "simple"? Are you just > referring to the traditional Solomonoff-Levin measure and the associated > old Occam's razor theorems, or do you mean something else? It is formally written in "Why Occams Razor", and the extension to non-computational observers is a fairly obvious corollory of the contents of "On Complexity and Emergence". To summarise the results for this list, the basic idea is that a observer acts as a filter, or category machine. On being fed a description, the description places it into a category, or equivalence class of descriptions. What we can say is that real observers will have a finite number of such equivalence classes it can categorise descriptions into (be they basins of attraction in a neural network, or output states in the form of books, etc). This is important, because if we consider the case of UTMs being the classification device, and outputs from halting programs as being the categories, then we end up with the S-L distribution over the set of halting programs. Unfortunately, the set of halting programs has measure zero within the set of all programs (descriptions), so we end up with the White Rabbit paradox, namely that purely random descriptions have probability one. One solution is to ban the "Wabbit" universes (which you do by means of the speed prior), and recover the nicely behaved S-L measure, which, essentially, is your solution (ie S-L convolved with the Speed Prior). The other way is to realise that the only example of conscious observer we are aware of is actually quite resilient to noise - only a small number of bits in a random string will be considered significant. We are really very good at detecting patterns in all sorts of random noise. Hence every random string will be equivalenced to some regular description. Then one ends up with an _observer dependent_ S-L-like probability distribution over the set of categories made by that observer. The corresponding Occam's Razor theorem follows in the usual way. I hope this is clear enough for you to follow... > > You are talking falsifiability. I am talking verifiability. Sure, you > cannot prove randomness. But that's not the point of any inductive > science. The point is to find regularities if there are any. Occam's > razor encourages us to search for regularity, even when we do not know > whether there is any. Maybe some PhD student tomorrow will discover a > simple PRG of the kind I mentioned, and get famous. > > It is important to see that Popper's popular and frequently cited and > overrated concept of falsifiability does not really help much to explain > what inductive science such as physics is all about. E.g., physicists > accept Everett's ideas although most of his postulated parallel universes > will remain inaccessible forever, and therefore are _not_ falsifiable. > Clearly, what's convincing about the multiverse theory is its simplicity, > not its falsifiability, in line with Solomonoff's theory of inductive > inference and Occam's razor, which is not just a wishy-washy philosophical > framework like Popper's. > > Similarly, today's string physicists accept theories for their simplicity, > not their falsifiability. Just like nobody is able to test whether > gravity is the same on Sirius, but believing it makes things simpler. > > Again: the essential question is: which prior is plausible? Which > represents the correct notion of simplicity? Solomonoff's traditional > prior, which does not care for temporal complexity at all? Even more > general priors computable in the limit, such as those discussed in > the algorithmic TOE paper? Or the Speed Prior, which embodies a more > restricted concept of simplicity that differs from Kolmogorov complexity > because it takes runtime into account, in an optimal fashion? > > Juergen Schmidhuber > > http://www.idsia.ch/~juergen/ > http://www.idsia.ch/~juergen/everything/html.html > http://www.idsia.ch/~juergen/toesv2/ > > The problem is that
Re: Predictions & duplications
I tried to understand the problem that doctors Schmidhuber and Standish are discussing by describing it in the most concrete terms I could, below. (I admit beforehand I couldn't follow all the details and do not know all the papers and theorems referred to, so this could be irrelevant.) So, say you are going to drop a pencil from your hand and trying to predict if it's going to fall down or up this time. Using what I understand with comp TOE, I would take the set of all programs that at some state implement a certain conscious state, namely the state in which you remember starting your experiment of dropping the pencil, and have already recorded the end result (I abreviate this conscious state with CS. To be exact it is a set of states, but that shouldn't make a difference). The space of all programs would be the set of all programs in some language, coded as infinite numerable sequences of 0's and 1's. (I do not know how much the chosen language + coding has effect on the whole thing). Now for your prediction you need to divide the implementations of CS into two sets: those in which the pencil fell down and those in which it fell up. Then you compare the measures of those sets. (You would need to assume that each program is run just once or something of the sort. Some programs obviously implement CS several times when they run. So you would maybe just include those programs that implement CS infinitely many times, and weight them with the density of CS occurrences during their running.) One way to derive the measure you need is to assume a measure on the set of all infinite sequences (i.e. on all programs). For this we have the natural measure, i.e. the product measure of the uniform measure on the set containing 0 and 1. And as far as my intuition goes, this measure would lead to the empirically correct prediction on the direction of the pencil falling. And if I understood it right, this is not too far from what Dr. Standish was claiming? And we wouldn't need any speed priors. But maybe the need of speed prior would come to play if I thought more carefully about the detailed assumptions involved? E.g. that each program would be run just once, with the same speed etc? I am not sure. Juho / Juho Pennanen Department of Forest Ecology, P.O.Box 24 FIN-00014 University of Helsinki tel. (09)191 58144 (+358-9-191 58144) GSM 040 5455 845 (+358-40-5455 845) http://www.helsinki.fi/people/juho.pennanen */
Re: Predictions & duplications
> > > From [EMAIL PROTECTED] : > > > [EMAIL PROTECTED] wrote: > > > > > > > > So you NEED something additional to explain the ongoing regularity. > > > > You need something like the Speed Prior, which greatly favors regular > > > > futures over others. > > > > > > I take issue with this statement. In Occam's Razor I show how any > > > observer will expect to see regularities even with the uniform prior > > > (comes about because all observers have resource problems, > > > incidently). The speed prior is not necessary for Occam's Razor. It is > > > obviously consistent with it though. > > > > First of all: there is _no_ uniform prior on infinitely many things. > > Try to build a uniform prior on the integers. (Tegmark also wrote that > > "... all mathematical structures are a priori given equal statistical > > weight," but of course this does not make much sense because there is > > _no_ way of assigning equal nonvanishing probability to all - infinitely > > many - mathematical structures.) > > I don't know why you insist on the prior being a PDF. It is not > necessary. With the uniform prior, all finite sets have vanishing > probability. However, all finite descriptions correspond to infinite > sets, and these infinite sets have non-zero probability. Huh? A PDF? You mean a probability density function? On a continuous set? No! I am talking about probability distributions on describable objects. On things you can program. Anyway, you write "...observer will expect to see regularities even with the uniform prior" but that clearly cannot be true. > > There is at best a uniform measure on _beginnings_ of strings. Then > > strings of equal size have equal measure. > > > > But then regular futures (represented as strings) are just as likely > > as irregular ones. Therefore I cannot understand the comment: "(comes > > about because all observers have resource problems, incidently)." > > Since you've obviously barked up the wrong tree here, it's a little > hard to know where to start. Once you understand that each observer > must equivalence an infinite number of descriptions due to the > boundedness of its resources, it becomes fairly obvious that the > smaller, simpler descriptions correspond to larger equivalence classes > (hence higher probability). Maybe you should write down formally what you mean? Which resource bounds? On which machine? What exactly do you mean by "simple"? Are you just referring to the traditional Solomonoff-Levin measure and the associated old Occam's razor theorems, or do you mean something else? > > Of course, alternative priors lead to alternative variants of Occam's > > razor. That has been known for a long time - formal versions of Occam's > > razor go at least back to Solomonoff, 1964. The big question really > > is: which prior is plausible? The most general priors we can discuss are > > those computable in the limit, in the algorithmic TOE paper. They do not > > allow for computable optimal prediction though. But the more restrictive > > Speed Prior does, and seems plausible from any programmer's point of view. > > > > > The interesting thing is of course whether it is possible to > > > experimentally distinguish between the speed prior and the uniform > > > prior, and it is not at all clear to me that it is possible to > > > distinguish between these cases. > > > > I suggest to look at experimental data that seems to have Gaussian > > randomness in it, such as interference patterns in split experiments. > > The Speed Prior suggests the data cannot be really random, but that a > > fast pseudorandom generator PRG is responsible, e.g., by dividing some > > seed by 7 and taking some of the resulting digits as the new seed, or > > whatever. So it's verifiable - we just have to discover the PRG method. > > > > I can't remember which incompleteness result it is, but it is > impossible to prove the randomness of any sequence. In order to > falsify your theory one would need to prove a sequence to be > random. However, of course if all known sequences are provably > pseudo-random (ie compressible), then this would constitute pretty > good evidence. However, this is a tall order, as there is no algorithm > for generating the compression behind an arbitrary sequence. You are talking falsifiability. I am talking verifiability. Sure, you cannot prove randomness. But that's not the point of any inductive science. The point is to find regularities if there are any. Occam's razor encourages us to search for regularity, even when we do not know whether there is any. Maybe some PhD student tomorrow will discover a simple PRG of the kind I mentioned, and get famous. It is important to see that Popper's popular and frequently cited and overrated concept of falsifiability does not really help much to explain what inductive science such as physics is all about. E.g., physicists accept Everett's ideas although most of his postulated parallel universes will remain inaccessib
Re: Predictions & duplications
[EMAIL PROTECTED] wrote: > > > > > From [EMAIL PROTECTED] : > > [EMAIL PROTECTED] wrote: > > > > > > So you NEED something additional to explain the ongoing regularity. > > > You need something like the Speed Prior, which greatly favors regular > > > futures over others. > > > > I take issue with this statement. In Occam's Razor I show how any > > observer will expect to see regularities even with the uniform prior > > (comes about because all observers have resource problems, > > incidently). The speed prior is not necessary for Occam's Razor. It is > > obviously consistent with it though. > > First of all: there is _no_ uniform prior on infinitely many things. > Try to build a uniform prior on the integers. (Tegmark also wrote that > "... all mathematical structures are a priori given equal statistical > weight," but of course this does not make much sense because there is > _no_ way of assigning equal nonvanishing probability to all - infinitely > many - mathematical structures.) I don't know why you insist on the prior being a PDF. It is not necessary. With the uniform prior, all finite sets have vanishing probability. However, all finite descriptions correspond to infinite sets, and these infinite sets have non-zero probability. > > There is at best a uniform measure on _beginnings_ of strings. Then > strings of equal size have equal measure. > > But then regular futures (represented as strings) are just as likely > as irregular ones. Therefore I cannot understand the comment: "(comes > about because all observers have resource problems, incidently)." > Since you've obviously barked up the wrong tree here, it's a little hard to know where to start. Once you understand that each observer must equivalence an infinite number of descriptions due to the boundedness of its resources, it becomes fairly obvious that the smaller, simpler descriptions correspond to larger equivalence classes (hence higher probability). > Of course, alternative priors lead to alternative variants of Occam's > razor. That has been known for a long time - formal versions of Occam's > razor go at least back to Solomonoff, 1964. The big question really > is: which prior is plausible? The most general priors we can discuss are > those computable in the limit, in the algorithmic TOE paper. They do not > allow for computable optimal prediction though. But the more restrictive > Speed Prior does, and seems plausible from any programmer's point of view. > > > The interesting thing is of course whether it is possible to > > experimentally distinguish between the speed prior and the uniform > > prior, and it is not at all clear to me that it is possible to > > distinguish between these cases. > > I suggest to look at experimental data that seems to have Gaussian > randomness in it, such as interference patterns in split experiments. > The Speed Prior suggests the data cannot be really random, but that a > fast pseudorandom generator PRG is responsible, e.g., by dividing some > seed by 7 and taking some of the resulting digits as the new seed, or > whatever. So it's verifiable - we just have to discover the PRG method. > I can't remember which incompleteness result it is, but it is impossible to prove the randomness of any sequence. In order to falsify your theory one would need to prove a sequence to be random. However, of course if all known sequences are provably pseudo-random (ie compressible), then this would constitute pretty good evidence. However, this is a tall order, as there is no algorithm for generating the compression behind an arbitrary sequence. Unless someone else has some brilliant ideas, its all looking a bit grim. > Juergen Schmidhuber > > http://www.idsia.ch/~juergen/ > http://www.idsia.ch/~juergen/everything/html.html > http://www.idsia.ch/~juergen/toesv2/ > > > Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions & duplications
> From [EMAIL PROTECTED] : > [EMAIL PROTECTED] wrote: > > > > So you NEED something additional to explain the ongoing regularity. > > You need something like the Speed Prior, which greatly favors regular > > futures over others. > > I take issue with this statement. In Occam's Razor I show how any > observer will expect to see regularities even with the uniform prior > (comes about because all observers have resource problems, > incidently). The speed prior is not necessary for Occam's Razor. It is > obviously consistent with it though. First of all: there is _no_ uniform prior on infinitely many things. Try to build a uniform prior on the integers. (Tegmark also wrote that "... all mathematical structures are a priori given equal statistical weight," but of course this does not make much sense because there is _no_ way of assigning equal nonvanishing probability to all - infinitely many - mathematical structures.) There is at best a uniform measure on _beginnings_ of strings. Then strings of equal size have equal measure. But then regular futures (represented as strings) are just as likely as irregular ones. Therefore I cannot understand the comment: "(comes about because all observers have resource problems, incidently)." Of course, alternative priors lead to alternative variants of Occam's razor. That has been known for a long time - formal versions of Occam's razor go at least back to Solomonoff, 1964. The big question really is: which prior is plausible? The most general priors we can discuss are those computable in the limit, in the algorithmic TOE paper. They do not allow for computable optimal prediction though. But the more restrictive Speed Prior does, and seems plausible from any programmer's point of view. > The interesting thing is of course whether it is possible to > experimentally distinguish between the speed prior and the uniform > prior, and it is not at all clear to me that it is possible to > distinguish between these cases. I suggest to look at experimental data that seems to have Gaussian randomness in it, such as interference patterns in split experiments. The Speed Prior suggests the data cannot be really random, but that a fast pseudorandom generator PRG is responsible, e.g., by dividing some seed by 7 and taking some of the resulting digits as the new seed, or whatever. So it's verifiable - we just have to discover the PRG method. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions & duplications
[EMAIL PROTECTED] wrote: > > So you NEED something additional to explain the ongoing regularity. > You need something like the Speed Prior, which greatly favors regular > futures over others. > I take issue with this statement. In Occam's Razor I show how any observer will expect to see regularities even with the uniform prior (comes about because all observers have resource problems, incidently). The speed prior is not necessary for Occam's Razor. It is obviously consistent with it though. The interesting thing is of course whether it is possible to experimentally distinguish between the speed prior and the uniform prior, and it is not at all clear to me that it is possible to distinguish between these cases. Cheers Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions & duplications
Bruno, there are so many misleading or unclear statements in your post - I do not even know where to start. I'll insert a few comments below. > Subject: Re: Predictions & duplications > From: Marchal <[EMAIL PROTECTED]> > Juergen Schmidhuber wrote: > > >We need a prior probability distribution on possible histories. > OK. I agree with that. But of course we differ on the meaning of > "possible histories". And we tackle also the "prior probability" > in quite different ways. > > >Then, once we have observed a past history, we can use Bayes' rule to > >compute the probability of various futures, given the observations. Then > >we predict the most probable future. > > It seems to me that Bayes' rule is what we *need* to build a plausible > *past* before, and concerning the future, and even the present, I prefer > a theory, or a philosophy (or religion), hoping it can put light on > the unknown. > > >The algorithmic TOE paper discusses several formally describable priors > >computable in the limit. But for the moment let me focus on my favorite, > >the Speed Prior. > > Mmh... You know that there exist class of infinitely speedable > programs (and theories), and even that those class correspond to a > superclass of universal machines. This entails the UD, > the Universal Dovetailer, which I like to call sometimes > the 'splashed Universal Turing Machine', > (which btw makes UD* sort of denotational semantics of the UTM) could > generate and execute quicker and quicker version of itself. > This means you need a non universal ontology. > For helping other to understand the point I > propose to await discussing this point after I give the solution of the > problem in "diagonalisation 1": > http://www.escribe.com/science/theory/m3079.html > > >Assume the Great Programmer has a resource problem, just like any > >other programmer. > > You told Wei Dai not taking the Great Programmer to seriously. > I am not sure I can imagine the Great Programmer having resource > problem. Unless you are a finitist! Knowing that the UTM emulating > the UD will use an unboundable part of its tape ... > How could the GP have a resource problem when he is the one > building universes in which resource problem can happen (apparently)? > Are you not presuposing some physical structure in advance somewhere? You can be a GP. Write a program that computes all universes computable in the limit. Add storage in case of storage overflow. Still, you as a GP have a resource problem. At some point it will get harder and harder for you to keep computing all the things you'd like to compute. The beings evolving in your universes won't be able to deduce much about the physics of your world in which your computer is built. Just like I cannot say much about the environment of the particular GP who programmed us. That's where religion starts, and where algorithmic TOEs stop making statements. > >According to the Speed Prior, the harder it is to compute > >something, the less likely it gets. More precisely, the cumulative prior > >probability of all data whose computation costs at least O(n) time is > >at most inversely proportional to n. > > Very interesting idea. I would say: the harder it is to compute > something FROM here and now the less likely it gets from here and now. Same thing. The Speed Prior leads to conditional probabilities of futures, given the past. > And I would > add that this is only true if your neighborhood is less complex than > yourself. I really believe your idea is interesting for some low level > computational physics. But when agent, including yourself, enters in the > play, its another matter, because it is intrinsically harder to compute > things then. I do not understand. But please do _not_ elaborate. > >To evaluate the plausibility of this, consider your own computer: most > >processes just take a few microseconds, some take a few seconds, few > >take hours, very few take days, etc... > > ? > In the work of the GP, that is in UD*, most processes takes billions > of googolplex kalpa ... Not if the GP has a resource problem, just like any programmer of virtual realities has a resource problem. Again: forget all the esoteric undecidability stuff, which does not really matter here, and be pragmatic and try to look at things from the perspective of some observer being computed as part of a virtual reality that you programmed. The observer may get smart and correctly suspect you have a resource problem, and can make nontrivial predictions from there, using the Speed Prior. > A lot of processes engaged by the UD simply does not stop. > Some are interesting and p
Re: Predictions & duplications
Juergen Schmidhuber wrote: >We need a prior probability distribution on possible histories. OK. I agree with that. But of course we differ on the meaning of "possible histories". And we tackle also the "prior probability" in quite different ways. >Then, once we have observed a past history, we can use Bayes' rule to >compute the probability of various futures, given the observations. Then >we predict the most probable future. It seems to me that Bayes' rule is what we *need* to build a plausible *past* before, and concerning the future, and even the present, I prefer a theory, or a philosophy (or religion), hoping it can put light on the unknown. >The algorithmic TOE paper discusses several formally describable priors >computable in the limit. But for the moment let me focus on my favorite, >the Speed Prior. Mmh... You know that there exist class of infinitely speedable programs (and theories), and even that those class correspond to a superclass of universal machines. This entails the UD, the Universal Dovetailer, which I like to call sometimes the 'splashed Universal Turing Machine', (which btw makes UD* sort of denotational semantics of the UTM) could generate and execute quicker and quicker version of itself. This means you need a non universal ontology. For helping other to understand the point I propose to await discussing this point after I give the solution of the problem in "diagonalisation 1": http://www.escribe.com/science/theory/m3079.html >Assume the Great Programmer has a resource problem, just like any >other programmer. You told Wei Dai not taking the Great Programmer to seriously. I am not sure I can imagine the Great Programmer having resource problem. Unless you are a finitist! Knowing that the UTM emulating the UD will use an unboundable part of its tape ... How could the GP have a resource problem when he is the one building universes in which resource problem can happen (apparently)? Are you not presuposing some physical structure in advance somewhere? >According to the Speed Prior, the harder it is to compute >something, the less likely it gets. More precisely, the cumulative prior >probability of all data whose computation costs at least O(n) time is >at most inversely proportional to n. Very interesting idea. I would say: the harder it is to compute something FROM here and now the less likely it gets from here and now. And I would add that this is only true if your neighborhood is less complex than yourself. I really believe your idea is interesting for some low level computational physics. But when agent, including yourself, enters in the play, its another matter, because it is intrinsically harder to compute things then. >To evaluate the plausibility of this, consider your own computer: most >processes just take a few microseconds, some take a few seconds, few >take hours, very few take days, etc... ? In the work of the GP, that is in UD*, most processes takes billions of googolplex kalpa ... A lot of processes engaged by the UD simply does not stop. Some are interesting and perhaps Turing Universal like the process raising better and better approximations of the Mandelbrot set. >We do not know the Great Programmer's computer and algorithm for computing >universe histories and other things. Still, we know that he cannot do >it more efficiently than the asymptotically fastest algorithm. Are you really sure about that. If the GP is really *the* GP, existing by Church Thesis, I think it follows from Blum Speed-up Theorem that it has no asymptotically fastest algorithm. But you are talking of "our little program" generated and executed by the UD perhaps. If this one has an asymptotically fastest algorithm, are you sure it still can emulate a UTM? I bet the "universe" is universal. Also: how could we know and test we are in a quick or slow version of that little program. Is that not (absolutely) undecidable. Should that program really be run? If yes, are you not presupposing again a physical universe? >The Speed Prior reflects properties of precisely this optimal algorithm. >It turns out that the best way of predicting future y, given past x, >is to minimize the (computable) Kt-complexity of (x,y). As observation >size grows, the predictions converge to the optimal predictions. As observation grows knowledge grows, and we grows making everything more complex that before. I guess you mean "optimal predictions" at some level. With comp the more knowledge grows, the more ignorance grows. >To address Bruno Marchal's questions: Many of my duplications in parallel >universes with identical past x will experience different futures y. >None of my copies knows a priori which one it is. But each does know: >Those futures y with high Kt(x,y) will be highly unlikely; those with >low Kt(x,y) won't. It is indeed hard to write a little program which generate a long Kolmogorov-Chaitin incompressible 01 string. But, as I told you earlier, it is quite ea
Re: Predictions & duplications
In summary, it would appear that Juergen is not disputing the use of the UDA after all, but simply the use of the uniform measure as an initial prior over the everything of all description. He argues that we should use the speed prior instead. Another way of putting it is that he doesn't dispute the role of indeterminism in generating observable complexity, only that it is necessarily limited in the amount of complexity that can be generated. None of this is in conflict with Occam's razor, or observable evidence to date. In fact it is debatable if it is experimentally falsifiable - no sequence can be proven to be truly random - there is always the possibility of some pseudo random generator being found to be the source. Cheers [EMAIL PROTECTED] wrote: > > > > Predictions & duplications > > > > From: Marchal <[EMAIL PROTECTED]> Thu Oct 4 11:58:13 2001 > > [...] > > You have still not explain to me how you predict your reasonably next > > experience in the simple WM duplication. [...] > > So, how is it that you talk like if you do have an algorithm > > capable of telling you in advance what will be your personal > > experience in a self-duplication experience. [...] > > Where am I wrong? > > Bruno > > > I will try again: how can we predict what's going to happen? > > We need a prior probability distribution on possible histories. > Then, once we have observed a past history, we can use Bayes' rule to > compute the probability of various futures, given the observations. Then > we predict the most probable future. > > The algorithmic TOE paper discusses several formally describable priors > computable in the limit. But for the moment let me focus on my favorite, > the Speed Prior. > > Assume the Great Programmer has a resource problem, just like any other > programmer. According to the Speed Prior, the harder it is to compute > something, the less likely it gets. More precisely, the cumulative prior > probability of all data whose computation costs at least O(n) time is > at most inversely proportional to n. > > To evaluate the plausibility of this, consider your own computer: most > processes just take a few microseconds, some take a few seconds, few > take hours, very few take days, etc... > > We do not know the Great Programmer's computer and algorithm for computing > universe histories and other things. Still, we know that he cannot do > it more efficiently than the asymptotically fastest algorithm. > > The Speed Prior reflects properties of precisely this optimal algorithm. > It turns out that the best way of predicting future y, given past x, > is to minimize the (computable) Kt-complexity of (x,y). As observation > size grows, the predictions converge to the optimal predictions. > > To address Bruno Marchal's questions: Many of my duplications in parallel > universes with identical past x will experience different futures y. > None of my copies knows a priori which one it is. But each does know: > Those futures y with high Kt(x,y) will be highly unlikely; those with > low Kt(x,y) won't. > > Under the Speed Prior it is therefore extremely unlikely that I will > suddenly observe truly random, inexplicable, surprising events, simply > because true randomness is very hard to compute, and thus has very high > Kt complexity. > > Juergen Schmidhuber > > http://www.idsia.ch/~juergen/ > http://www.idsia.ch/~juergen/everything/html.html > http://www.idsia.ch/~juergen/toesv2/ > > Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (") Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02