Re: [agi] Comments On My Skepticism of Solomonoff Induction

Jim Bromer Fri, 06 Aug 2010 03:51:51 -0700

>
> Jim: So, did Solomonoff's original idea involve randomizing whether the
> next bit would be a 1 or a 0 in the program?


Abram: Yep.
I meant, did Solomonoff's original idea involve randomizing whether the next
bit in the program's that are originally used to produce the *prior
probabilities* involve the use of randomizing whether the next bit would be
a 1 or a 0?  I have not been able to find any evidence that it was.
I thought that my question was clear but on second thought I guess it
wasn't. I think that the part about the coin flips was only a method to
express that he was interested in the probability that a particular string
would be produced from all possible programs, so that when actually testing
the prior probability of a particular string the program that was to be run
would have to be randomly generated.
Jim Bromer




On Wed, Aug 4, 2010 at 10:27 PM, Abram Demski <[email protected]> wrote:

> Jim,
>
>  Your function may be convergent but it is not a probability.
>>
>
> True! All the possibilities sum to less than 1. There are ways of
> addressing this (ie, multiply by a normalizing constant which must also be
> approximated in a convergent manner), but for the most part adherents of
> Solomonoff induction don't worry about this too much. What we care about,
> mostly, is comparing different hyotheses to decide which to favor. The
> normalizing constant doesn't help us here, so it usually isn't mentioned.
>
>
> You said that Solomonoff's original construction involved flipping a coin
>> for the next bit.  What good does that do?
>
>
> Your intuition is that running totally random programs to get predictions
> will just produce garbage, and that is fine. The idea of Solomonoff
> induction, though, is that it will produce systematically less garbage than
> just flipping coins to get predictions. Most of the garbage programs will be
> knocked out of the running by the data itself. This is supposed to be the
> least garbage we can manage without domain-specific knowledge
>
> This is backed up with the proof of dominance, which we haven't talked
> about yet, but which is really the key argument for the optimality of
> Solomonoff induction.
>
>
> And how does that prove that his original idea was convergent?
>
>
> The proofs of equivalence between all the different formulations of
> Solomonoff induction are something I haven't cared to look into too deeply.
>
> Since his idea is incomputable, there are no algorithms that can be run to
>> demonstrate what he was talking about so the basic idea is papered with all
>> sorts of unverifiable approximations.
>
>
> I gave you a proof of convergence for one such approximation, and if you
> wish I can modify it to include a normalizing constant to ensure that it is
> a probability measure. It would be helpful to me if your criticisms were
> more specific to that proof.
>
> So, did Solomonoff's original idea involve randomizing whether the next bit
>> would be a 1 or a 0 in the program?
>>
>
> Yep.
>
> Even ignoring the halting problem what kind of result would that give?
>>
>
> Well, the general idea is this. An even distribution intuitively represents
> lack of knowledge. An even distribution over possible data fails horribly,
> however, predicting white noise. We want to represent the idea that we are
> very ignorant of what the data might be, but not *that* ignorant. To capture
> the idea of regularity, ie, similarity between past and future, we instead
> take an even distribution over *descriptions* of the data. (The distribution
> in the 2nd version of solomonoff induction that I gave, the one in which the
> space of possible programs is represented as a continuum, is an even
> distribution.) This appears to provide a good amount of regularity without
> too much.
>
> --Abram
>
> On Wed, Aug 4, 2010 at 8:10 PM, Jim Bromer <[email protected]> wrote:
>
>> Abram,
>> Thanks for the explanation.  I still don't get it.  Your function may be
>> convergent but it is not a probability.  You said that Solomonoff's original
>> construction involved flipping a coin for the next bit.  What good does that
>> do?  And how does that prove that his original idea was convergent?  The
>> thing that is wrong with these explanations is that they are not coherent.
>> Since his idea is incomputable, there are no algorithms that can be run to
>> demonstrate what he was talking about so the basic idea is papered with all
>> sorts of unverifiable approximations.
>>
>> So, did Solomonoff's original idea involve randomizing whether the next
>> bit would be a 1 or a 0 in the program?  Even ignoring the halting
>> problem what kind of result would that give?  Have you ever solved the
>> problem for some strings and have you ever tested the solutions using a
>> simulation?
>>
>> Jim Bromer
>>
>> On Mon, Aug 2, 2010 at 5:12 PM, Abram Demski <[email protected]>wrote:
>>
>>> Jim,
>>>
>>> Interestingly, the formalization of Solomonoff induction I'm most
>>> familiar with uses a construction that relates the space of programs with
>>> the real numbers just as you say. This formulation may be due to Solomonoff,
>>> or perhaps Hutter... not sure. I re-formulated it to "gloss over" that in
>>> order to make it simpler; I'm pretty sure the version I gave is equivalent
>>> in the relevant aspects. However, some notes on the original construction.
>>>
>>> Programs are created by flipping coins to come up with the 1s and 0s. We
>>> are to think of it like this: whenever the computer reaches the end of the
>>> program and tries to continue on, we flip a coin to decide what the next bit
>>> of the program will be. We keep doing this for as long as the computer wants
>>> more bits of instruction.
>>>
>>> This framework makes room for infinitely long programs, but makes them
>>> infinitely improbable-- formally, they have probability 0. (We could alter
>>> the setup to allow them an infinitesimal probability.) Intuitively, this
>>> means that if we keep flipping a coin to tell the computer what to do,
>>> eventually we will either create an infinite loop-back (so the computer
>>> keeps executing the already-written parts of the program and never asks for
>>> more) or write out the "HALT" command. Avoiding doing one or the other
>>> forever is just too improbable.
>>>
>>> This also means all real numbers are output by some program! It just may
>>> be one which is infinitely long.
>>>
>>> However, all the programs that "slip past" my time bound as T increases
>>> to infinity will have measure 0, meaning they don't add anything to the sum.
>>> This means the convergence is unaffected.
>>>
>>> Note: in this construction, program space is *still* a well-defined
>>> entity.
>>>
>>> --Abram
>>>
>>> On Sun, Aug 1, 2010 at 9:05 PM, Jim Bromer <[email protected]> wrote:
>>>
>>>> Abram,
>>>>
>>>> This is a very interesting function.  I have spent a lot of time
>>>> thinking about it.  However, I do not believe that does, in any way,
>>>> prove or indicate that Solomonoff Induction is convergent.  I want to
>>>> discuss the function but I need to take more time to study some stuff and 
>>>> to
>>>> work various details out.  (Although I have thought a lot about it, I am
>>>> writing this under a sense of deadline, so it may not be well composed.)
>>>>
>>>>
>>>>
>>>> My argument was that Solomonoff's conjecture, which was based (as far as
>>>> I can tell) on 'all possible programs', was fundamentally flawed because 
>>>> the
>>>> idea of 'all possible programs' is not a programmable definition.  All
>>>> possible programs is a domain, not a class of programs that can be feasibly
>>>> defined in the form of an algorithm that could 'reach' all the programs.
>>>>
>>>>
>>>>
>>>> The domain of all possible programs is trans-infinite just as the domain
>>>> of irrational numbers are.  Why do I believe this?  Because if we
>>>> imagine that infinite algorithms are computable, then we could compute
>>>> irrational numbers.  That is, there are programs that, given infinite
>>>> resources, could compute irrational numbers.  We can use the binomial
>>>> theorem, for example to compute the square root of 2.  And we can use
>>>> trial and error methods to compute the nth root of any number.  So that
>>>> proves that there are infinite irrational numbers that can be computed by
>>>> algorithms that run for infinity.
>>>>
>>>>
>>>>
>>>> So what does this have to do with Solomonoff's conjecture of all
>>>> possible programs?  Well, if I could prove that any individual
>>>> irrational number could be computed (with programs that ran through
>>>> infinity) then I might be able to prove that there are trans-infinite
>>>> programs.  If I could prove that some trans-infinite subset of
>>>> irrational numbers could be computed then I might be able to prove that 
>>>> 'all
>>>> possible programs' is a trans-infinite class.
>>>>
>>>>
>>>> Now Abram said that since his sum, based on runtime and program length,
>>>> is convergent it can prove that Solomonoff Induction is convergent.  Even
>>>> assuming that his convergent sum method could be fixed up a little, I
>>>> suspect that this time-length bound is misleading.  Since a Turing
>>>> Machine allows for erasures this means that a program could last longer 
>>>> than
>>>> his time parameter and still produce an output string that matches the 
>>>> given
>>>> string.  And if 'all possible programs' is a trans-infinite class then
>>>> there are programs that you are going to miss.  Your encoding method will
>>>> miss trans-infinite programs (unless you have trans-cended the
>>>> trans-infinite.)
>>>>
>>>> However, I want to study the function and some other ideas related to
>>>> this kind of thing a little more.  It is an interesting function.
>>>> Unfortunately I also have to get back to other-worldly things.
>>>>
>>>> Jim Bromer
>>>>
>>>>
>>>> On Mon, Jul 26, 2010 at 2:54 PM, Abram Demski <[email protected]>wrote:
>>>>
>>>>> Jim,
>>>>>
>>>>> I'll argue that solomonoff probabilities are in fact like Pi, that is,
>>>>> computable in the limit.
>>>>>
>>>>> I still do not understand why you think these combinations are
>>>>> necessary. It is not necessary to make some sort of ordering of the sum to
>>>>> get it to converge: ordering only matters for infinite sums which include
>>>>> negative numbers. (Perhaps that's where you're getting the idea?)
>>>>>
>>>>> Here's my proof, rewritten from an earlier post, using the properties
>>>>> of infinite sums of non-negative numbers.
>>>>>
>>>>> (preliminaries)
>>>>>
>>>>> Define the computation as follows: we start with a string S which we
>>>>> want to know the Solomonoff probability of. We are given a time-limit T. 
>>>>> We
>>>>> start with P=0, where P is a real number with precision 2*log_4(T) or 
>>>>> more.
>>>>> We use some binary encoding for programs which (unlike normal programming
>>>>> languages) does not contain syntactically invalid programs, but still will
>>>>> (of course) contain infinite loops and so on. We run program "0" and "1" 
>>>>> for
>>>>> T/4 each, "00", "01", "10" and "11" for T/16 each, and in general run each
>>>>> program of length N for floor[T/(4^N)] until T/(4^N) is less than 1. Each
>>>>> time we run a program and the result is S, we add 1/(4^N) to P.
>>>>>
>>>>> (assertion)
>>>>>
>>>>> P converges to some value as T is increased.
>>>>>
>>>>> (proof)
>>>>>
>>>>> If every single program were to output S, then T would converge to 1/4
>>>>> + 1/4 + 1/16 + 1/16 + 1/16 + 1/16 + ... that is, 2*(1/(4^1)) + 
>>>>> 4*(1/(4^2)) +
>>>>> 8*(1/(4^3)) + ... which comes to 1/2 + 1/4 + 1/8 + 1/16 + .. i.e. 1/(2^1) 
>>>>> +
>>>>> 1/(2^2) + 1/(2^3) + ... ; it is well-known that this sequence converges to
>>>>> 1. Thus 1 is an upper bound for P: it could only get that high if every
>>>>> single program were to output S.
>>>>>
>>>>> 0 is a lower bound, since we start there and never subtract anything.
>>>>> In fact, every time we add more to P, we have a new lower bound: P will
>>>>> never go below a number once it reaches it. The sum can only increase.
>>>>> Infinite sums with this property must either converge to a finite number, 
>>>>> or
>>>>> go to infinity. However, we already know that 1 is an upper bound for P; 
>>>>> so,
>>>>> it cannot go to infinity. Hence, it must converge.
>>>>>
>>>>> --Abram
>>>>>
>>>>>   On Mon, Jul 26, 2010 at 9:14 AM, Jim Bromer <[email protected]>wrote:
>>>>>
>>>>>>   As far as I can tell right now, my theories that Solomonoff
>>>>>> Induction is trans-infinite were wrong.  Now that I realize that the
>>>>>> mathematics do not support these conjectures, I have to acknowledge that 
>>>>>> I
>>>>>> would not be able to prove or even offer a sketch of a proof of my
>>>>>> theories.  Although I did not use rigourous mathematics while I have 
>>>>>> tried
>>>>>> to make an assessment of the Solomonoff method, the first principle of
>>>>>> rigourous mathematics is to acknowledge that the mathematics does not
>>>>>> support your supposition when they don't.
>>>>>>
>>>>>> Solomonoff Induction isn't a mathematical theory because the desired
>>>>>> results are not computable.  As I mentioned before, there are a great 
>>>>>> many
>>>>>> functions that are not computable but which are useful and important 
>>>>>> because
>>>>>> they tend toward a limit which can be seen in with a reasonable number of
>>>>>> calculations using the methods available.  Pi is one such function.  (I 
>>>>>> am
>>>>>> presuming that pi would require an infinite expansion which seems right.)
>>>>>>
>>>>>> I have explained, and I think it is a correct explanation, that there
>>>>>> is no way that you could make an apriori computation of all possible
>>>>>> combinations taken from infinite values.  So you could not even come up 
>>>>>> with
>>>>>> a theoretical construct that could take account of that level of
>>>>>> complexity.  It is true that you could come up with a theoretical
>>>>>> computational method that could take account of any finite number of 
>>>>>> values,
>>>>>> and that is what we are talking about when we talk about the infinite, 
>>>>>> but
>>>>>> in this context it only points to a theoretical paradox.  Your 
>>>>>> theoretical
>>>>>> solution could not take the final step of computing a probability for a
>>>>>> string until it had run through the infinite combinations and this is
>>>>>> impossible.  The same problem does not occur for pi because the function
>>>>>> that produces it tends toward a limit.
>>>>>>
>>>>>> The reason I thought Solomonoff Induction was trans-infinite was
>>>>>> because I thought it used the Bayesian notion to compute the
>>>>>> probability using all possible programs that produced a particular 
>>>>>> substring
>>>>>> following a given prefix.  Now I understand that the desired function is 
>>>>>> the
>>>>>> computation of only the probability of a particular string (not all 
>>>>>> possible
>>>>>> substrings that are identical to the string) following a given prefix.  I
>>>>>> want to study the method a little further during the next few weeks, but 
>>>>>> I
>>>>>> just wanted to make it clear that, as far as now understand the program,
>>>>>> that I do not think that it is trans-infinite.
>>>>>>
>>>>>> Jim Bromer
>>>>>>
>>>>>



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Comments On My Skepticism of Solomonoff Induction

Reply via email to