On Tue, Jul 7, 2020 at 2:31 PM Matt Mahoney <mattmahone...@gmail.com> wrote:
> Why bother with a CIC training and test set? Compression evaluates every > bit as a test given the previous bits as training. Even if the compression > algorithm doesn't explicitly predict bits, it is equivalent to one that > does by the chain rule. The probability of a string is equal to the product > of the conditional probabilities of its symbols. > > You can see this effect at work in http://mattmahoney.net/dc/text.html > The ranking of enwik8 (first 100 MB) closely tracks the ranking of enwik9. > Most of the variation is due to memory constraints. In small memory models, > compression is worse overall and closer to the result you would get from > compressing the parts independently. > > Occam's Razor doesn't necessarily hold under constrained resources. All > probability distributions over an infinite set of strings must favor > shorter ones, but that isn't necessarily true over the finite set of > programs that can run on a computer with finite memory. > Yes, and that is the most vocal of Ben's critiques of what we're calling now (I guess) *The COIN Hypothesis* (however much of a stretch it is to get to the memetic advantage of that acronym). To reiterate that hypothesis with a little more nuance and refinement including emphasis on resource constraint: *The COIN (COmpression Information criterioN) Hypothesis is about the empirical world *and is *both* that among: 1. existing models of a process, the one producing the smallest executable archive of the same training data and within *the same computation constraints*, will *generally* also produce the smallest executable archive of the test data, AND 2. *all* model selection criterion, will do so *more* *generally*. More to the point, whenever I discuss COIN as *the* model selection criterion, the obvious fact that it isn't... - a mathematically provable aspect of "the unreasonable effectiveness of mathematics in the natural sciences" - empirically tested (although it seems measurable) - widespread, nor even minority use ...people react in one of 3 ways, in order of frequency: 1. Huh? Wha? Fuhgeddaboudit. 2. Where's the empirical evidence? 3. Minimum Description Length Principle is just the Bayesian Information Criterion. 4. You're just plain wrong because _insert some invalid critique_. Indeed, the research program I set forth should be pursued if for no other reason than to rank order the general practicality of various model selection criteria. On Tue, Jul 7, 2020 at 2:31 PM Matt Mahoney <mattmahone...@gmail.com> wrote: > Why bother with a CIC training and test set? Compression evaluates every > bit as a test given the previous bits as training. Even if the compression > algorithm doesn't explicitly predict bits, it is equivalent to one that > does by the chain rule. The probability of a string is equal to the product > of the conditional probabilities of its symbols. > The practice of dividing training and test data is standard industry practice. Why should it not be pursued in this instance since the point is to convince people of the truth or falsity of COIN? ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta901988932dbca83-M9076d4fa7ac88c4402052595 Delivery options: https://agi.topicbox.com/groups/agi/subscription