Nice Occam's Razor argument. I understood it simply because I knew there are always an infinite number of possible explanations for every observation that are more complicated than the simplest explanation. So, without a reason to choose one of those other interpretations, then why choose it? You could look for reasons in complex environments, but it would likely be more efficient to wait for a reason to need a better explanation. It's more efficient to wait for an inconsistency than to search an infinite set without a reason to do so.
Dave On Fri, Jul 2, 2010 at 6:08 PM, Matt Mahoney <matmaho...@yahoo.com> wrote: > Jim, to address all of your points, > > Solomonoff induction claims that the probability of a string is > proportional to the number of programs that output the string, where each > program M is weighted by 2^-|M|. The probability is dominated by the > shortest program (Kolmogorov complexity), but it is not exactly the same. > The difference is small enough that we may neglect it, just as we neglect > differences that depend on choice of language. > > Here is the proof that Kolmogorov complexity is not computable. Suppose it > were. Then I could test the Kolmogorov complexity of strings in increasing > order of length (breaking ties lexicographically) and describe "the first > string that cannot be described in less than a million bits", contradicting > the fact that I just did. (Formally, I could write a program that outputs > the first string whose Kolmogorov complexity is at least n bits, choosing n > to be larger than my program). > > Here is the argument that Occam's Razor and Solomonoff distribution must be > true. Consider all possible probability distributions p(x) over any infinite > set X of possible finite strings x, i.e. any X = {x: p(x) > 0} that is > infinite. All such distributions must favor shorter strings over longer > ones. Consider any x in X. Then p(x) > 0. There can be at most a finite > number (less than 1/p(x)) of strings that are more likely than x, and > therefore an infinite number of strings which are less likely than x. Of > this infinite set, only a finite number (less than 2^|x|) can be shorter > than x, and therefore there must be an infinite number that are longer than > x. So for each x we can partition X into 4 subsets as follows: > > - shorter and more likely than x: finite > - shorter and less likely than x: finite > - longer and more likely than x: finite > - longer and less likely than x: infinite. > > So in this sense, any distribution over the set of strings must favor > shorter strings over longer ones. > > > -- Matt Mahoney, matmaho...@yahoo.com > > > ------------------------------ > *From:* Jim Bromer <jimbro...@gmail.com> > *To:* agi <agi@v2.listbox.com> > *Sent:* Fri, July 2, 2010 4:09:38 PM > > *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI > > > > On Fri, Jul 2, 2010 at 2:25 PM, Jim Bromer <jimbro...@gmail.com> wrote: >> >> There cannot be a one to one correspondence to the representation of >>> the shortest program to produce a string and the strings that they produce. >>> This means that if the consideration of the hypotheses were to be put into >>> general mathematical form it must include the potential of many to one >>> relations between candidate programs (or subprograms) and output strings. >>> >> > >> But, there is also no way to determine what the "shortest" program is, >> since there may be different programs that are the same length. That means >> that there is a many to one relation between programs and program "length". >> So the claim that you could just iterate through programs *by length* is >> false. This is the goal of algorithmic information theory not a premise >> of a methodology that can be used. So you have the diagonalization problem. >> > > > A counter argument is that there are only a finite number of Turing Machine > programs of a given length. However, since you guys have specifically > designated that this theorem applies to any construction of a Turing Machine > it is not clear that this counter argument can be used. And there is still > the specific problem that you might want to try a program that writes a > longer program to output a string (or many strings). Or you might want to > write a program that can be called to write longer programs on a dynamic > basis. I think these cases, where you might consider a program that outputs > a longer program, (or another instruction string for another Turing > Machine) constitutes a serious problem, that at the least, deserves to be > answered with sound analysis. > > Part of my original intuitive argument, that I formed some years ago, was > that without a heavy constraint on the instructions for the program, it will > be practically impossible to test or declare that some program is indeed the > shortest program. However, I can't quite get to the point now that I can > say that there is definitely a diagonalization problem. > > Jim Bromer > > *agi* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com/> > *agi* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com/> > ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com