Re: Claude 3.5 sonnet

PGC Sun, 23 Jun 2024 12:01:48 -0700


On Saturday, June 22, 2024 at 1:51:56 AM UTC+2 John Clark wrote:


Yesterday some people were saying that the improvement in large language 
models had reached a wall, but they can't say that today because Claude 3.5 
Sonnet came out today and it beats GPT-4o on most benchmarks. But the 
really amazing thing is that it's MUCH smaller than GPT-4o and thus much 
faster and much cheaper to operate. The company that makes it  Anthropic, 
says they will come out with their far larger version, Claude 3.5 Opus, 
sometime later this year. I think it's going to be amazing. 

Anthropic's SHOCKING New Model BREAKS the Software Industry! Claude 3.5 
Sonnet Insane Coding Ability <https://www.youtube.com/watch?v=_mkyL0Ww_08>


With all the headlines proclaiming AI achieved this or surpassed that 
milestone, the absence of the basic distinction between narrow and general 
AI does not appear conspicuous to us? LLMs are Narrow AI designed to 
perform specific tasks within the limits of their training data. They excel 
at tasks like text prediction, pattern recognition, and generating creative 
content but lack the broad understanding and cognitive abilities needed to 
handle diverse and unpredictable situations, which is the defining property 
of general AI.

Marketing often blurs this distinction, leading the public and investors to 
overestimate the “intelligence” of narrow AI. For years, narrow AI has been 
making significant strides in natural language processing, image 
recognition, guessing your shopping preferences on Amazon, navigating us, 
game playing etc. Companies capitalize on these accomplishments and the 
ambiguous use of terms like "intelligence" and "learning" in public 
discourse to suggest that AI systems now are somehow far more advanced than 
their progress over the years suggests.

This confusion allows companies to imply that their AI technologies are as 
competent as a student passing a test, more effective than doctors at 
finding patterns in datasets for cancer research or drug development, and 
capable of generating mathematical proofs. They effectively cash in on this 
ambiguity, leading people/investors to think their algorithms are getting 
generally smarter or that scientists have built such powerful AIs that 
general intelligence is within reach.

While the advances in narrow AI are impressive and have been so for years, 
they come with limitations. For example, we were promised perfect 
autonomous driving by Elon Musk years ago, yet reality continually presents 
situations outside the training set, leading to safety concerns and... 
accidents. Real ones. This illustrates the difficulty in achieving the 
"general" part of AI, which can handle unexpected cases.

The coupling of ever more effective narrow AI, marketing opportunities and 
profits, the mystique of general AI that said marketing relies on, and the 
public belief in superintelligence and technological progress all work 
together to hype the public into believing that their phones, apps, and 
browsers are smarter than they are. Benchmarks are mainly memory-reliant, 
unlike the ARC test or similar types of problems, and the public is 
bombarded with questionable "breakthroughs" in the headlines, stating that 
some model achieved a high score on this or that test. These headlines 
conveniently omit that expanding the training data towards some narrow 
domain-specific task will yield such abilities trivially. 

No number of benchmarks aced that rely on memory by narrow AI can prepare 
it for the real world and unexpected cases not in their training data. This 
is why it matters how results on benchmarks are achieved. Anticipating a 
domain-specific problem and providing AI with a cheat sheet through 
training data adaptation and fine-tuning on the fly is not the same as tech 
meeting an unexpected situation in reality without software engineers 
tweaking the thing live. The distinction and our understanding of it 
affects the technologies we develop that rely on these AIs.

Our understanding of the terms we use, their biases, and honest appraisals 
of the genuine possibilities and limitations are crucial. For now, 
commercial interests profit from this ambiguity, but clarity and 
transparency will benefit technological progress and public trust in the 
long run. Do note, I am not making a claim that there is nothing to LLMs 
and their recent boost in applications, capacity, conveniences offered, and 
similar developments. But it’s narrow AI  by definition and occasional 
advances and spurts of growth should be expected. It’s fascinating but its 
also… what do we call it… marketing and money. 

-- 
You received this message because you are subscribed to the Google Groups 
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to everything-list+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/everything-list/76096d30-3443-4f3f-906e-e0f517fb88a1n%40googlegroups.com.

Re: Claude 3.5 sonnet

Reply via email to