The reason that we can use text compression to test language models but we
can't use video compression to test vision models is signal to noise ratio.
In both cases, human short term memory write speed is about 10 bits per
second. The Hutter corpus is 70% human generated text and 30% computer
generated XML, HTML, and Wiki markup and boilerplate articles about US
places. Both are mostly signal, with maybe 2% noise mostly from
inconsistent use of spaces in tables or after a period. That is +17 db SNR.
Uncompressed 4K by 2K video at 60 FPS and 24 bits per pixel is over 10^10
bits per second, which is -90 db SNR.

-- Matt Mahoney, [email protected]

On Wed, Feb 25, 2026, 3:25 PM James Bowery <[email protected]> wrote:

>
>
> On Wed, Feb 25, 2026 at 1:15 PM Matt Mahoney <[email protected]>
> wrote:
> ... various assertions that in effect assume the conclusion regarding "the
> way things are" ...
>
>> It would be nice if we could use AIT to resolve political questions if
>> the signal to compress wasn't overwhelmed by noise.
>>
>
> Remember when the author of Kayak  came out with that same argument
> against The Hutter Prize back in 2006?
>
> It was wrong then and it is wrong now even with datasets with vastly more
> "noise" -- as though anyone can argue with me when I say "one man's noise
> is another man's ciphertext" as I've been saying for the last 2 decades.
> It's really ironic that I had to, over the course of several years, drag
> Charles Sinclare Smith kicking and screaming to the understanding that what
> he calls "data cleaning" (which is where 90%+ of the work went when dealing
> with real world data while Charlie was given control of the DoE's
> Information Administration by Carter) is covered by what I call "forensic
> epistemology" which lossless compression inevitably provide.  Think of the
> noisy data at Hume's Guillotine for example.  One of the first stages of
> any algorithm is to run through conversions to forms that retain the
> salient characteristics, and retain the "irrelevant noise" for
> reconstruction.  The published algorithm is the only principled way to even
> begin talking about "data quality".  It's even more ironic that he financed
> the second neural net summer from the SDF because he realized that
> modelilng the energy economy required not applying Moore's Law to
> macrosocial modeling, but that dynamics rather than mere statistics require
> funding guys like Werbos -- not just guys like Hinton.  Yet it wasn't until
> I showed him the original 2017 Nature paper on SINDy that he finally "got
> it".
>
> This has been a difference in our motivations since day one of the Hutter
> Prize, Matt.  I was motivated to discover the identities latent in the data
> that were generating not just "noise" but disinformation.  And, yes, there
> is a difference.  It's even more difficult to discover the arithmetic sign
> of a data source than it is to separate noise (0) from (+) signal.  Yet,
> this is precisely what those who think of themselves as our betters would
> have us believe they are equipped to do for us and on our behalf, despite
> our objections.
>
> These people are hiding behind your objections, Matt.
>
>
>> What data would you compress to solve the immigration issue?
>>
>
> How about "resolve" rather than "solve"?  This is the whole point of
> separating "IS" from "OUGHT" and recognizing that we can't even begin to
> identify where political "extremism", such as open borders, might be
> "unnecessary." (It is by definition extremist for a polity to hold in
> contempt more than a supermajority on any salient issue for multiple
> generations -- let alone one that defines the polis aka "We The People"!)
>
>
>> What would you compress to answer the question of at what point after
>> conception does life begin?
>>
>
> Again, conflating IS and OUGHT.  And that's not even addressing the
> importance of operational definition of terms such as "life".
>
> We have political division because it is profitable to media companies.
>
> That's your theory of macrosocial dynamics showing.
>
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/Tb9c1aaff01c2b823-M6097c7d9b0af8f74d2c6a058>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb9c1aaff01c2b823-Mab1c370f7615a56221de9c8e
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to