“The assumption that all data sets are equally likely is wrong.” I am not sure that this is really an assumption necessary for NFLT to apply. This is just one consequence, I think—as a mathematically clear way to define the theorem. But the no free lunch theorem applies to today’s machine learning problems—on daily basis. For example, for some problems random forest does better job, for others logistic regression does better job, etc. This is no free lunch in everyday application; and yet all data sets are not equally likely. Also, deep learning is better a in broader set of problems but there is no free lunch either because mode data are needed for this broader application. And so on.
Occam’s Razor is one consequence of no free lunch theorem. If you have two models that account for data equally well, you take the simpler one. See the above example of logistic regression (simplest), random forest (more complex) and deep learning (most complex). If on data set A you do well with logistic regression, you don’t want to use any of the more complex models. Also, if dataset B can be accounted for accurately by a random forest, you don’t want to go any more complex. Why? Because you will have to pay the price. For example, a naïve modeler may be tempted to use deep learning because deep learning is “so powerful” and can do more things. So, you may want to try squeezing dataset B together with A into a single deep learning model and have one model accounting for both. Naively, one may think that this is better now. But of course, we know that this is not a good idea—this is what Occam Razor tells us. But also no-free-lunch tells us the same thing too, and plus it explains to us why this is the case: we will have to pay a price in something else. In this case we may need several orders of magnitude more data to make a model that accounts for both A and B. Deep learning performs across a broader data set A+B but poorly. Logistic regression performs better over a narrower data set A but is helpless on B. Random forest performs better over another medium narrow data set B but is horrible at A. On average all three perform about equally well across both data sets. Dummy table: Data set | A score | B score | total score ---------------------------------------------------------- LogReg 10 0 10 RandFor 0 10 10 deepLearn 5 5 10 ---------------------------------------------------------- So, if you work with a single data set either A or B, Occam’s Razor tells you to pick the model that does 10 on that data set. That way, Occam Razor prescribes how to deal with no free lunch in models. Best regards, Danko Sent from Mail for Windows 10 From: Matt Mahoney Sent: Montag, 3. Februar 2020 22:51 To: AGI Subject: Re: [agi] Re: General Intelligence vs. no-free-lunch theorem On Mon, Feb 3, 2020, 2:48 PM Danko Nikolic <danko.niko...@gmail.com> wrote: Thanks everyone for the opinions. So, I see a lot of disagreement. But let me try to accurate about what this disagreement is about. It seems like nobody is challenging the idea that no-free-lunch theorem contradicts the notion of general intelligence (and vice versa). Instead, everyone seems to challenge the validity of the no-free-lunch theorem. Would that be a correct conclusion so far? The proof is correct. The assumption that all data sets are equally likely is wrong. We know a-priori that Occam's Razor applies. Artificial General Intelligence List / AGI / see discussions + participants + delivery options Permalink ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T353f2000d499d93b-M9ebb629966328f7434af4c40 Delivery options: https://agi.topicbox.com/groups/agi/subscription