Send Link mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.anu.edu.au/mailman/listinfo/link
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Link digest..."
Today's Topics:
1. OpenAI's latest model solved 5 out of 6 problems on the
International Math Olympiad exam (Stephen Loosley)
----------------------------------------------------------------------
Message: 1
Date: Mon, 21 Jul 2025 20:22:07 +0930
From: Stephen Loosley <[email protected]>
To: "link" <[email protected]>
Subject: [LINK] OpenAI's latest model solved 5 out of 6 problems on
the International Math Olympiad exam
Message-ID: <[email protected]>
Content-Type: text/plain; charset="UTF-8"
OpenAI just won gold at the world's most prestigious math competition. Here's
why that's a big deal.
By Lakshmi Varanasi Jul 20, 2025, 8:25 AM GMT+10
https://www.businessinsider.com/openai-gold-iom-math-competition-2025-7
[Photo Caption: An experimental LLM from OpenAI won gold at the International
Math Olympiad this week. sasirin pamai/Getty Images]
OpenAI's latest model solved five out of six problems on the International Math
Olympiad exam.
OpenAI CEO Sam Altman called it "a significant marker of how far AI has come
over the past decade."
AI skeptic Gary Marcus said he was "impressed" but that the model's utility is
yet to be seen.
OpenAI's latest experimental model is a math whiz, performing so well on an
insanely difficult math exam that everyone's now talking about it.
"I'm excited to share that our latest @OpenAI experimental reasoning LLM has
achieved a longstanding grand challenge in AI: gold medal-level performance on
the world's most prestigious math competition ? the International Math Olympiad
(IMO)," Alexander Wei, a member of OpenAI's technical staff, said on X.
The International Math Olympiad is a global competition that began in 1959 in
Romania and is now considered one of the hardest in the world. It's divided
into two days, during which participants are given a four-and-a-half-hour exam,
each with three questions. Some famous winners include Grigori Perelman, who
helped advance geometry, and Terence Tao, recipient of the Fields Medal, the
highest honor in mathematics.
In June, Tao predicted on Lex Fridman's podcast that AI would not score high on
the IMO. He suggested researchers shoot a bit lower. "There are smaller
competitions. There are competitions where the answer is a number rather than a
long-form proof," he said.
Yet OpenAI's latest model solved five out of six of the problems correctly,
working under the same testing conditions as humans, Wei said.
Wei's colleague, Noam Brown, said the model displayed a new level of endurance
during the exam.
"IMO problems demand a new level of sustained creative thinking compared to
past benchmarks," he said. "This model thinks for a long time."
Wei said the model is an upgrade in general intelligence. The model's
performance is "breaking new ground in general-purpose reinforcement learning,"
he said. DeepMind's AlphaGeometry, by contrast, is specifically designed just
to do math.
What is Grok?
"This is an LLM doing math and not a specific formal math system; it is part of
our main push towards general intelligence," Altman said on X.
"When we first started openai, this was a dream but not one that felt very
realistic to us; it is a significant marker of how far AI has come over the
past decade," Altman wrote, referring to the model's performance at IOM.
Altman added that a model with a "gold level of capability" will not be
available to the public for "many months."
The achievement is an example of how fast the technology is developing. Just
last year, "AI labs were using grade school math" to evaluate models, Brown
said. And tech billionaire Peter Thiel said last year it would take at least
another three years before AI could solve US Math Olympiad problems.
Still, there are always skeptics.
Gary Marcus, a well-known critic of AI hype, called the model's performance
"genuinely impressive" on X. But he also posed several questions about how the
model was trained, the scope of its "general intelligence," the utility for the
general population, and the cost per problem. Marcus also said that the IMO has
not independently verified these results.
--
------------------------------
Subject: Digest Footer
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link
------------------------------
End of Link Digest, Vol 392, Issue 9
************************************