On Tue, Jul 1, 2025, 7:41 AM John Rose <[email protected]> wrote:
> On Wednesday, June 18, 2025, at 11:36 AM, Matt Mahoney wrote: > > In my 2013 paper on the cost of AI, I estimated that human DNA is > equivalent to 300M lines of code using the best compressors at 1.7 bits per > base pair and 16 bits per line of code. > > > Is that estimate based on molecular structure? > I compressed the human genome hg19 using the best known techniques. It has some gaps in the highly repetitive parts, which doesn't affect the compressed size. But the number would be significantly lower if we removed the 92% of non functional DNA, which tends to be less compressible because it accumulates random mutations. The functional part has lots of duplicate genes so it might be only 1% or 3M lines of code. OTOH humans have about the same number of genes as the flatworm C. Elegans but 100 times more DNA. The effect of removing non functional DNA probably constrains the evolutionary search space because Most mutations are harmful. Most mammals have 2-4 billion based pairs like humans, so the non functional parts must have some purpose because it consumes resources and otherwise would have evolved out. Another way to upper bound the Kolmogorov complexity of the genome is by the information rate of evolution, which is one bit per population doubling generation. This gives roughly 10^9 bits for mammals. Humans share 98-99% DNA with chimpanzees, which split 6M years ago. This means that less than 10^6 generations resulted in 10^8 bits of differences, so 99% must be noise that doesn't affect fitness. In my paper I just compressed the raw data. The details are in the appendix. https://mattmahoney.net/costofai.pdf -- Matt Mahoney, [email protected] ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tcf5991bdfcdced3b-Mc163e388621e3a2eb24c22b8 Delivery options: https://agi.topicbox.com/groups/agi/subscription
