On Tue, Jul 1, 2025, 7:41 AM John Rose <[email protected]> wrote:

> On Wednesday, June 18, 2025, at 11:36 AM, Matt Mahoney wrote:
>
> In my 2013 paper on the cost of AI, I estimated that human DNA is
> equivalent to 300M lines of code using the best compressors at 1.7 bits per
> base pair and 16 bits per line of code.
>
>
> Is that estimate based on molecular structure?
>

I compressed the human genome hg19 using the best known techniques. It has
some gaps in the highly repetitive parts, which doesn't affect the
compressed size. But the number would be significantly lower if we removed
the 92% of non functional DNA, which tends to be less compressible because
it accumulates random mutations. The functional part has lots of duplicate
genes so it might be only 1% or 3M lines of code.

OTOH humans have about the same number of genes as the flatworm C. Elegans
but 100 times more DNA. The effect of removing non functional DNA probably
constrains the evolutionary search space because Most mutations are
harmful. Most mammals have 2-4 billion based pairs like humans, so the non
functional parts must have some purpose because it consumes resources and
otherwise would have evolved out.

Another way to upper bound the Kolmogorov complexity of the genome is by
the information rate of evolution, which is one bit per population doubling
generation. This gives roughly 10^9 bits for mammals. Humans share 98-99%
DNA with chimpanzees, which split 6M years ago. This means that less than
10^6 generations resulted in 10^8 bits of differences, so 99% must be noise
that doesn't affect fitness.

In my paper I just compressed the raw data. The details are in the appendix.
https://mattmahoney.net/costofai.pdf


-- Matt Mahoney, [email protected]

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcf5991bdfcdced3b-Mc163e388621e3a2eb24c22b8
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to