You can always find languages that favor either hypothesis. Suppose that you want to predict the sequence 10, 21, 32, ? and we write our hypothesis as a function that takes the trial number (0, 1, 2, 3...) and returns the outcome. The sequence 10, 21, 32, 43, 54... would be coded:
int hypothesis_1(int trial) { return trial*11+10; } The sequence 10, 21, 32, 10, 21, 32... would be coded int hypothesis_2(int trial) { return trial%3*11+10; } which is longer and therefore less likely. Here is another example: predict the sequence 0, 1, 4, 9, 16, 25, 36, 49, ? Can you find a program shorter than this that doesn't predict 64? int hypothesis_1(int trial) { return trial*trial; } -- Matt Mahoney, matmaho...@yahoo.com ________________________________ From: David Jones <davidher...@gmail.com> To: agi <agi@v2.listbox.com> Sent: Tue, June 29, 2010 3:48:01 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Such an example is no where near sufficient to accept the assertion that program size is the right way to define simplicity of a hypothesis. Here is a counter example. It requires a slightly more complex example because all zeros doesn't leave any room for alternative hypotheses. Here is the sequence: 10, 21, 32 void hypothesis_1() { int ten = 10; int counter = 0; while (1) { print(ten+counter) ten = ten + 10; counter = counter + 1; } } void hypothesis_2() { while (1) print("10 21 32") } Hypothesis 2 is simpler, yet clearly wrong. These examples don't really show anything. Dave On Tue, Jun 29, 2010 at 3:15 PM, Matt Mahoney <matmaho...@yahoo.com> wrote: David Jones wrote: >> I really don't think this is the right way to calculate simplicity. > > >I will give you an example, because examples are more convincing than proofs. > > >Suppose you perform a sequence of experiments whose outcome can either be 0 or >1. In the first 10 trials you observe 0000000000. What do you expect to >observe in the next trial? > > >Hypothesis 1: the outcome is always 0. >Hypothesis 2: the outcome is 0 for the first 10 trials and 1 thereafter. > > >Hypothesis 1 is shorter than 2, so it is more likely to be correct. > > >If I describe > the two hypotheses in French or Chinese, then 1 is still shorter than 2. > > >If I describe the two hypotheses in C, then 1 is shorter than 2. > > > void hypothesis_1() { >> while (1) printf("0"); > } > > > void hypothesis_2() { > int i; > for (i=0; i<10; ++i) printf("0"); > while (1) printf("1"); > } > > >If I translate these programs into Perl or Lisp or x86 assembler, then 1 will >still be shorter than 2. > > >I realize there might be smaller equivalent programs. But I think you could >find a smaller program equivalent to hypothesis_1 than hypothesis_2. > > >I realize there are other hypotheses than 1 or 2. But I think that the >smallest one you can find that outputs > eleven bits of which the first ten are zeros will be a program that outputs > another zero. > > >I realize that you could rewrite 1 so that it is longer than 2. But it is the >shortest version that counts. More specifically consider all programs in which >the first 10 outputs are 0. Then weight each program by 2^-length. So the >shortest programs dominate. > > >I realize you could make up a language where the shortest encoding of >hypothesis 2 is shorter than 1. You could do this for any pair of hypotheses. >However, I think if you stick to "simple" languages (and I realize this is a >circular definition), then 1 will usually be shorter than 2. > > -- Matt Mahoney, matmaho...@yahoo.com > > > > > ________________________________ From: David Jones <davidher...@gmail.com> >To: agi <agi@v2.listbox.com> >Sent: Tue, June 29, 2010 1:31:01 PM > >Subject: Re: [agi] Re: Huge Progress on the Core of AGI > > > > > >>On Tue, Jun 29, 2010 at 11:26 AM, Matt Mahoney <matmaho...@yahoo.com> wrote: > >>> >>> Right. But Occam's Razor is not complete. It says simpler is better, but 1) >>> this only applies when two hypotheses have the same explanatory power and >>> 2) what defines simpler? >> >> >>A hypothesis is a program that outputs the observed data. It "explains" the >>data if its output matches what is observed. The "simpler" hypothesis is the >>shorter program, measured in bits. > >I can't be confident that bits is the right way to do it. I suspect bits is an >approximation of a more accurate method. I also suspect that you can write a >more complex explanation "program" with the same number of bits. So, there are >some flaws with this approach. It is an interesting idea to consider though. >> > > > >> >>The language used to describe the data can be any Turing complete programming >>language (C, Lisp, etc) or any natural language such as English. It does not >>matter much which language you use, because for any two languages there is a >>fixed length procedure, described in either of the languages, independent of >>the data, that translates descriptions in >> one language to the other. > >Hypotheses don't have to be written in actual computer code and probably >shouldn't be because hypotheses are not really meant to be "run" per say. And >outputs are not necessarily the right way to put it either. Outputs imply >prediction. And as mike has often pointed out, things cannot be precisely >predicted. We can, however, determine whether a particular observation fits >expectations, rather than equals some prediction. There may be multiple >possible outcomes that we expect and which would be consistent with a >hypothesis, which is why actual prediction should not be used. > > >> For example, the simplest hypothesis for all visual interpretation is that >> everything in the first image is gone in the second image, and everything in >> the second image is a new object. Simple. Done. Solved :) right? >> >> >>The hypothesis is not the simplest. The program that outputs the two frames >>as if independent cannot be smaller than the two frames compressed >>independently. The program could be made smaller if it only described how the >>second frame is different than the first. It would be more likely to >>correctly predict the third frame if it continued to run and described how it >>would be different than the second frame. > >I really don't think this is the right way to calculate simplicity. > > >>> >> >> >>> I don't think much progress has been made in this area, but I'd like to >>> know what other people have done and any successes they've had. >> >> >>Kolmogorov proved that the solution is not >> computable. Given a hypothesis (a description of the observed data, or a >> program that outputs the observed data), there is no general procedure or >> test to determine whether a shorter (simpler, better) hypothesis exists. >> Proof: suppose there were. Then I could describe "the first data set that >> cannot be described in less than a million bits" even though I just did. (By >> "first" I mean the first data set encoded by a string from shortest to >> longest, breaking ties lexicographically). > >You are making a different point than the question I asked. It may not be >computable to determine whether a better hypothesis exists, but that is not >what I want to know. What I want to know is what is the better of any two >hypotheses I can come up with. Creating the hypotheses requires a different >algorithm, which is not as hard to me as the other parts of the problem. >> > > > >> >>That said, I believe the state of the art in both language and vision are >>based on hierarchical neural models, i.e. pattern recognition using learned >>weighted combinations of simpler patterns. I am more familiar with language. >>The top ranked programs can be found at http://mattmahoney.net/dc/text.html > >state of the art in explanatory reasoning is what I'm looking for. > > >>> >> >> -- Matt Mahoney, matmaho...@yahoo.com >> >> >> >> >> ________________________________ From: David Jones <davidher...@gmail.com> >>To: agi <agi@v2.listbox.com> >>Sent: Tue, June 29, 2010 10:44:41 AM >>Subject: Re: [agi] Re: Huge Progress on the Core of AGI >> >> >>Thanks Matt, >> >>Right. But Occam's Razor is not complete. It says simpler is better, but 1) >>this only applies when two hypotheses have the same explanatory power and 2) >>what defines simpler? >> >>So, maybe what I want to know from the state of the art in research is: >> >>1) how precisely do other people define "simpler" >>and >>2) More importantly, how do you compare competing explanations/hypotheses >>that have more or less explanatory power. Simpler does not apply unless you >>are comparing equally explanatory hypotheses. >> >>For example, the simplest hypothesis for all visual interpretation is that >>everything in the first image is gone in the second image, and everything in >>the second image is a new object. Simple. Done. Solved :) right? Well, >>clearly a more complicated explanation is warranted because a more >>complicated explanation is more *explanatory* and a better explanation. So, >>why is it better? Can it be defined as better in a precise way so that you >>can compare arbitrary hypotheses or explanations? That is what I'm trying to >>learn about. I don't think much progress has been made in this area, but I'd >>like to know what other people have done and any successes they've had. >> >>Dave >> >> >> >>On Tue, Jun 29, 2010 at 10:29 AM, Matt Mahoney <matmaho...@yahoo.com> wrote: >> >>David Jones wrote: >>>> If anyone has any knowledge of or references to the state of the art in >>>> explanation-based reasoning, can you send me keywords or links? >>> >>> >>>The simplest explanation of the past is the best predictor of the future. >>>http://en.wikipedia.org/wiki/Occam's_razor >>>http://www.scholarpedia.org/article/Algorithmic_probability >>> >>> -- Matt Mahoney, matmaho...@yahoo.com >>>>>> >>> >>> >>> >>> ________________________________ From: David Jones <davidher...@gmail.com> >>>>>> >>> >>>To: agi <agi@v2.listbox.com> >>>Sent: Tue, June 29, 2010 9:05:45 AM >>>Subject: [agi] Re: Huge Progress on the Core of AGI >>> >>> >>>If anyone has any knowledge of or references to the state of the art in >>>explanation-based reasoning, can you send me keywords or links? I've read >>>some through google, but I'm not really satisfied with anything I've found. >>> >>>Thanks, >>> >>>Dave >>> >>> >>>On Sun, Jun 27, 2010 at 1:31 AM, David Jones <davidher...@gmail.com> wrote: >>> >>>>>>>A method for comparing hypotheses in explanatory-based reasoning: >>>> >>>>We prefer the hypothesis or explanation that *expects* more observations. >>>>If both explanations expect the same observations, then the simpler of the >>>>two is preferred (because the unnecessary terms of the more complicated >>>>explanation do not add to the predictive power). >>>> >>>>Why are expected events so important? They are a measure of 1) explanatory >>>>power and 2) predictive power. The more predictive and >>>>the more explanatory a hypothesis is, the more likely the hypothesis is >>>>when compared to a competing hypothesis. >>>> >>>>Here are two case studies I've been analyzing from sensory perception of >>>>simplified visual input: >>>>>>>> >>>> >>>> >>>> >>>> >>>>The goal of the case studies is to answer the following: How do you >>>>generate the most likely motion hypothesis in a way that is >>>>general and applicable to AGI? >>>>Case Study 1) Here is a link to an example: animated gif of two black >>>>squares move from left to right. Description: Two black squares are moving >>>>in unison from left to right across a white screen. In each frame the black >>>>squares shift to the right so that square 1 steals square 2's original >>>>position and square two moves an equal distance to the right. >>>>Case Study 2) Here is a link to an example: the interrupted square. >>>>Description: A single square is moving from left to right. Suddenly in the >>>>third frame, a single black square is added in the middle of the expected >>>>path of the original black square. This second square just stays there. So, >>>>what happened? Did the square moving from left to right keep moving? Or did >>>>it stop and then another square suddenly appeared and moved from left to >>>>right? >>>> >>>>Here is a simplified version of how we solve case study 1: >>>>The important hypotheses to consider are: >>>>1) the square from frame 1 of the video that has a very close position to >>>>the square from frame 2 should be matched (we hypothesize that they are the >>>>same square and that any difference in position is motion). So, what >>>>happens is that in each two frames of the video, we only match one square. >>>>The other square goes unmatched. >>>>>>>> >>>> >>>> >>>> >>>> >>>>2) We do the same thing as in hypothesis #1, but this time we also match >>>>the remaining squares and hypothesize motion as follows: the first square >>>>jumps over the second square from left to right. We hypothesize that this >>>>happens over and over in each frame of the video. Square 2 stops and square >>>>1 jumps over it.... over and over again. >>>>>>>> >>>> >>>> >>>> >>>> >>>>3) We hypothesize that both squares move to the right in unison. This is >>>>the correct hypothesis. >>>> >>>>So, why should we prefer the correct hypothesis, #3 over the other two? >>>> >>>>Well, first of all, #3 is correct because it has the most explanatory power >>>>of the three and is the simplest of the three. Simpler is better because, >>>>with the given evidence and information, there is no reason to desire a >>>>more complicated hypothesis such as #2. >>>> >>>>So, the answer to the question is because explanation #3 expects the most >>>>observations, such as: >>>>1) the consistent relative positions of the squares in each frame are >>>>expected. >>>>2) It also expects their new positions in each from based on velocity >>>>calculations. >>>>>>>> >>>> >>>> >>>> >>>> >>>>3) It expects both squares to occur in each frame. >>>> >>>>Explanation 1 ignores 1 square from each frame of the video, because it >>>>can't match it. Hypothesis #1 doesn't have a reason for why the a new >>>>square appears in each frame and why one disappears. It doesn't expect >>>>these observations. In fact, explanation 1 doesn't expect anything that >>>>happens because something new happens in each frame, which doesn't give it >>>>a chance to confirm its hypotheses in subsequent frames. >>>> >>>>The power of this method is immediately clear. It is general and it solves >>>>the problem very cleanly. >>>> >>>>Here is a simplified version of how we solve case study 2: >>>>We expect the original square to move at a similar velocity from left to >>>>right because we hypothesized that it did move from left to right and we >>>>calculated its velocity. If this expectation is confirmed, then it is more >>>>likely than saying that the square suddenly stopped and another started >>>>moving. Such a change would be unexpected and such a conclusion would be >>>>unjustifiable. >>>> >>>>I also believe that explanations which generate fewer incorrect >>>>expectations should be preferred over those that more incorrect >>>>expectations. >>>> >>>>The idea I came up with earlier this month regarding high frame rates to >>>>reduce uncertainty is still applicable. It is important that all generated >>>>hypotheses have as low uncertainty as possible given our constraints and >>>>resources available. >>>> >>>>I thought I'd share my progress with you all. I'll be testing the ideas on >>>>test cases such as the ones I mentioned in the coming days and weeks. >>>> >>>>Dave >>>> >>> >>>>>> >>>agi | Archives >>> | Modify >>> Your Subscription >>>>>> >>>agi | Archives >>> | Modify >>> Your Subscription >> >>>> >>agi | Archives >> | Modify >> Your Subscription >>>> >>agi | Archives >> | Modify >> Your Subscription > >> >agi | Archives > | Modify > Your Subscription >> >agi | Archives > | Modify > Your Subscription agi | Archives | Modify Your Subscription ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com