Re: [Metamath] Codex wrote an MM verifier in Go pretty easily [AI Progress Report]

Zarathustra Goertzel Thu, 18 Sep 2025 14:31:52 -0700


Whoops, this is embarrasing that I didn’t glance very closely at the 
code.., yes, the $v processing block hardly checks anything.


Implementing a MM verifier as a human, the approach I took was to carefully 
follow a reference implementation to try to ensure not missing some detail 
— clearly the LLM doesn’t do that. I am sorry 🙏.

I think it’d be great to unify the test suites. I liked how David Wheeler’s 
metamath test suite made it easy to test multiple verifiers.

I ran into a few ‘gaps’ in mmverify.py's adherence to the specs (and added 
a few tests to trigger them, some LLM-generated). 

Incidentally, I gave Matthew House’s list to GPT-5, and it found 4 issues 
that mmverify.py trips up on (and 1 false negative). I’ve uploaded the four 
test.mm files: 
https://github.com/zariuq/mmverify.py/tree/pure/adversarial_tests.

   1. It allows $d with a constant symbol. 
   2. $d with duplicates (e.g., $d x x $). 
   3. Redeclaring a variable as a constant. 
   4. And doesn’t work with incomplete proofs via ?. 

While the tests seem functional in pinpointing differences in performance 
between mmverify.py and metamath-exe, I think they're pretty crude (and 
could be more aesthetically formulated by someone who knows MM better than 
I do).

This does get me thinking about how to verify the verifiers (aside from 
actually formally verifying them).  While the tests are helpful, to an 
extent, I guess one falls back on "humans reading the code to judge whether 
it's really doing what we want"....

— Zar Goertzel

On Wednesday, September 17, 2025 at 11:47:53 PM UTC+2 [email protected] 
wrote:

> It isn't either/or. Testsuites specific to one tool can have more 
> assertions on error messages and can also test features (e.g. the proof 
> minimizer or web page generation) other than proof verification.
>
> I do appreciate David's encouragement of contributions to metamath-test 
> though, and am thinking of taking a closer look when I get some time.
>
>
> On September 17, 2025 2:26:54 PM PDT, Matthew House <[email protected]> 
> wrote:
>
>> Coincidentally, by running it on one of my test files, I just noticed 
>> that mmverify.py suffers from the exact same issue 
>> <https://github.com/david-a-wheeler/mmverify.py/issues/30> of allowing 
>> proofs to reference inactive hypotheses belonging to other theorems. 
>> Centralizing all the test suites probably would be a good idea.
>>
>> On Wed, Sep 17, 2025 at 10:32 AM David A. Wheeler <[email protected]> 
>> wrote:
>>
>>>
>>>
>>> > On Sep 16, 2025, at 6:42 PM, Jim Kingdon <[email protected]> wrote:
>>> > 
>>> > Those all sound like good things to add to the testsuite.
>>>
>>> Patches welcome :-) to:
>>> https://github.com/david-a-wheeler/metamath-test
>>>
>>>
>>> > (The tests in the metamath-exe and metamath-knife repos are I think 
>>> better developed, although in principle some of their tests could be fed 
>>> back to a testsuite which could be run on any verifier).
>>>
>>> That was my goal for metamath-test. I also think it's useful to have a 
>>> "general" test suite.
>>>
>>> I'd be happy to contribute that repo to the  https://github.com/metamath 
>>> organization. They're all MIT licensed, allowing anyone do practically 
>>> anything except sue the authors. Anyone want me to do that?
>>>
>>> > On September 16, 2025 12:44:47 PM PDT, Matthew House <
>>> [email protected]> wrote:
>>> > A few issues I spotted, many of which are nitpicks and some of which 
>>> are more serious:...
>>> ...
>>> >  Writing a correct Metamath verifier requires very careful attention 
>>> to detail, which it seems that LLMs still struggle with, short of directed 
>>> hand-holding or extraordinarily thorough test suites. The original 
>>> mmverify.py code is quite compact for how many details it captures.
>>>
>>>
>>> LLMs can be a *great* productivity aid, but only an *aid*. The emerging 
>>> term is "AI Code Assistant" which I think gives the right flavor. They are 
>>> helpful *assistants* to humans, but at least so far, they are not 
>>> *replacements* for humans.
>>>
>>> --- David A. Wheeler
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/metamath/673178d9-6b7f-4975-9256-8d90fee3839en%40googlegroups.com.

Re: [Metamath] Codex wrote an MM verifier in Go pretty easily [AI Progress Report]

Reply via email to