Re: Is AI running out of training data?

Brent Meeker Thu, 12 Dec 2024 18:38:13 -0800

Magic is always the explanation of those who can't understand.


Brent


On 12/12/2024 1:39 PM, 'Cosmin Visan' via Everything List wrote:

Magic!

On Thursday, 12 December 2024 at 20:00:58 UTC+2 John Clark wrote:

    *The number of "tokens" (words or parts of words) used to train
    LLMs is 100 times larger than it was in 2020, the largest are now
    using tens of trillions.  if you only consider text then the
    entire Internet only contains about 3,100 trillion tokens. The
    amount of text LLMs train on is doubling every year but the amount
    of human generated text on the Internet is only growing at about
    10% a year, if that trend continues AIs will run out of text
    somewhere around 2028.  Does that mean AI progress is about to hit
    a wall? I don't think so for the following reasons:*

    *For one thing, because of improvements in algorithms, the
    computing power needed for a Large Language Model  to achieve the
    same performance has halved about every 8 months. *
    *
    *
    *ALGORITHMIC PROGRESS IN LANGUAGE MODELS*
    <https://arxiv.org/pdf/2403.05812>


    *And computer chips specialized for AI rather than general
    computing, like those made by Nvidia and other companies, are
    getting faster even more rapidly than Moore's Law. Also, the rate
    of growth of specialized data sets, such as astronomical and
    biological data, are growing much much more quickly than text is;
    that's how AIs got so good at predicting how proteins fold up. *

    *And there is vastly more information if AI's are trained on other
    types of data besides text, and some AI's are already being
    trained on unlabeled images and videos.  Yann LeCun, chief AI
    scientist at Meta, said that "/although the 10^13  tokens used to
    train a LLM  sounds like a lot /(it would take a human 170,000
    years to read that much)/, a 4-year-old child has absorbed a
    volume of data 50 times greater than that just by looking at
    objects during his waking hours. We’re never going to get to
    human-level AI by just training on language, that’s just not
    happening/".*

    *And then there's synthetic data. AlphaGeometry was trained to
    solve geometry problems using 100 million computer generated
    synthetic examples with no human demonstrations, and it ended up
    being as good at solving difficult geometry problems as the very
    best high school students in the entire nation. *

    *Solving olympiad geometry without human demonstrations*
    <https://www.nature.com/articles/s41586-023-06747-5>

    *AI researchers are starting to change their strategy and have
    their AI's reread their training set many times because AI's
    operate in a statistical way so rereading improves performance *


    *Scaling Data-Constrained Language Models*
    <https://arxiv.org/pdf/2305.16264>


    *Andy Zouat Carnegie Mellon Universitysays "/once //an AI has got
    a foundational knowledge base that’s probably greater than any
    single person could have,it no longer needs more data to get
    smarter. It just needs to sit and think. I think we’re probably
    pretty close to that point/.”*
    *
    *
    *John K Clark    See what's on my new list at Extropolis
    <https://groups.google.com/g/extropolis>*
    nps






--

You received this message because you are subscribed to the GoogleGroups "Everything List" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].To view this discussion visithttps://groups.google.com/d/msgid/everything-list/87d36fd7-9b3d-44e7-8bf7-885e87eca4e4n%40googlegroups.com<https://groups.google.com/d/msgid/everything-list/87d36fd7-9b3d-44e7-8bf7-885e87eca4e4n%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
You received this message because you are subscribed to the Google Groups 
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/everything-list/753ece80-1404-4803-9bfa-1cf1a3e4d9ef%40gmail.com.

Re: Is AI running out of training data?

Reply via email to