We establish causation by controlled experiments. If you want to test if X causes Y, then you vary X and observe Y while keeping everything else the same. The two problems with analyzing data sets by compression are that the other conditions are not all the same and that there may be conditions that affect Y that are not in the data set.
We do not know why the US has had an epidemic of obesity and diabetes since the 1980's. First we were told to avoid fats. Then we were told to avoid carbs. Neither worked. Could it be because fewer people smoke? In China and Eastern Europe, everyone smokes and nobody is overweight. Doesn't nicotine suppress appetite? Or maybe it's something else. What does your data set say? We do not know why skin cancer rates have been rising since the 1980s, about the time that sunscreens were introduced. Could sunscreens cause cancer (by increasing exposure to UVA and total UV by blocking the tanning effect of UVB)? I don't think that dermatologists would deliberately lie to us. All the research is public. What does your data set say? Ray Kurzweil was at one point taking 100 life extension supplements at a cost of $1 million per year so he could live to see the singularity at 100 and become immortal. But there are exactly zero supplements shown to extend life. How would you test them? Randomly assign babies to take either an experimental drug or a placebo every day of their lives and wait 75 years? It's now illegal to do these tests even on chimpanzees, and other primates are next. And why are we still debating adding fluoride to drinking water after 70 years? Why are we still debating vaccine safety? I suppose there is no help for people who prefer to get their data from right wing conspiracy videos on YouTube than from algorithmic information theory. But that's an AI problem too. We train AI to tell us what we want to hear, and it obliges. So yeah, I agree it can be done, but there are a lot of practical obstacles. -- Matt Mahoney, [email protected] On Sun, Dec 21, 2025, 5:54 PM James Bowery <[email protected]> wrote: > > > On Sun, Dec 21, 2025 at 3:59 PM Matt Mahoney <[email protected]> > wrote: > >> On Sun, Dec 21, 2025, 3:05 PM James Bowery <[email protected]> wrote: >> >>> >>> We're almost there, again, Matt. Ask not what I would do with this >>> information, ask why we don't have this information in the first place. >>> >> >> Because the information we want is causation, and compression only tells >> you about correlation. >> > > Every high school physics student knows that even systems as simple 3-body > gravitational interaction cannot be described by correlation. It requires > going beyond Shannon or Rissannen or any other noise from the statistics > world. It requires feedback. Although some might claim that all it > requires in a discrete and finite universe is a finite state machine, not a > UTM, it does at least require that much. > > There's a lot of work going on in the area of dynamical systems > identification from measurement data. > > But I hear you about "you can't know what causes what". This is *always* > the argument trotted out when people in power stop losing their ability to > impose their theories of causality on others and start being challenged by > scientists. > > Back in the days of the 30 Years War it was all about which theocracy's > "miracles" were permitted to vitiate causal laws. Nowadays, it may not be > so much about "miracles" as simple truth claims about the futility of > resistance to impersonal forces that are completely impervious to agency. > People in power and those who identify with them like to trot that one out > whenever there is an argument about policy interventions. > > Like I said, we're there again only on a global scale with powers that > dwarf those available at the dawn of artillery. I'd really like to avoid > having to go through that again. > > > >> >> We can easily compress a table of global statistics to find a negative >> correlation between economic development and fertility. But that doesn't >> say which causes the other. >> >> The problem with using AI is that people upvote answers they agree with, >> rather than the correct answers. I'm not ready to outsource my brain yet. >> >>> >>> *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T6cf3be509c7cd2f2-M21862e03ac1b394666ee1761> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T6cf3be509c7cd2f2-M73835b4eb85bc92c0aa4603f Delivery options: https://agi.topicbox.com/groups/agi/subscription
