Whoops, this is embarrasing that I didn’t glance very closely at the code.., yes, the $v processing block hardly checks anything.
Implementing a MM verifier as a human, the approach I took was to carefully follow a reference implementation to try to ensure not missing some detail — clearly the LLM doesn’t do that. I am sorry 🙏. I think it’d be great to unify the test suites. I liked how David Wheeler’s metamath test suite made it easy to test multiple verifiers. I ran into a few ‘gaps’ in mmverify.py's adherence to the specs (and added a few tests to trigger them, some LLM-generated). Incidentally, I gave Matthew House’s list to GPT-5, and it found 4 issues that mmverify.py trips up on (and 1 false negative). I’ve uploaded the four test.mm files: https://github.com/zariuq/mmverify.py/tree/pure/adversarial_tests. 1. It allows $d with a constant symbol. 2. $d with duplicates (e.g., $d x x $). 3. Redeclaring a variable as a constant. 4. And doesn’t work with incomplete proofs via ?. While the tests seem functional in pinpointing differences in performance between mmverify.py and metamath-exe, I think they're pretty crude (and could be more aesthetically formulated by someone who knows MM better than I do). This does get me thinking about how to verify the verifiers (aside from actually formally verifying them). While the tests are helpful, to an extent, I guess one falls back on "humans reading the code to judge whether it's really doing what we want".... — Zar Goertzel On Wednesday, September 17, 2025 at 11:47:53 PM UTC+2 [email protected] wrote: > It isn't either/or. Testsuites specific to one tool can have more > assertions on error messages and can also test features (e.g. the proof > minimizer or web page generation) other than proof verification. > > I do appreciate David's encouragement of contributions to metamath-test > though, and am thinking of taking a closer look when I get some time. > > > On September 17, 2025 2:26:54 PM PDT, Matthew House <[email protected]> > wrote: > >> Coincidentally, by running it on one of my test files, I just noticed >> that mmverify.py suffers from the exact same issue >> <https://github.com/david-a-wheeler/mmverify.py/issues/30> of allowing >> proofs to reference inactive hypotheses belonging to other theorems. >> Centralizing all the test suites probably would be a good idea. >> >> On Wed, Sep 17, 2025 at 10:32 AM David A. Wheeler <[email protected]> >> wrote: >> >>> >>> >>> > On Sep 16, 2025, at 6:42 PM, Jim Kingdon <[email protected]> wrote: >>> > >>> > Those all sound like good things to add to the testsuite. >>> >>> Patches welcome :-) to: >>> https://github.com/david-a-wheeler/metamath-test >>> >>> >>> > (The tests in the metamath-exe and metamath-knife repos are I think >>> better developed, although in principle some of their tests could be fed >>> back to a testsuite which could be run on any verifier). >>> >>> That was my goal for metamath-test. I also think it's useful to have a >>> "general" test suite. >>> >>> I'd be happy to contribute that repo to the https://github.com/metamath >>> organization. They're all MIT licensed, allowing anyone do practically >>> anything except sue the authors. Anyone want me to do that? >>> >>> > On September 16, 2025 12:44:47 PM PDT, Matthew House < >>> [email protected]> wrote: >>> > A few issues I spotted, many of which are nitpicks and some of which >>> are more serious:... >>> ... >>> > Writing a correct Metamath verifier requires very careful attention >>> to detail, which it seems that LLMs still struggle with, short of directed >>> hand-holding or extraordinarily thorough test suites. The original >>> mmverify.py code is quite compact for how many details it captures. >>> >>> >>> LLMs can be a *great* productivity aid, but only an *aid*. The emerging >>> term is "AI Code Assistant" which I think gives the right flavor. They are >>> helpful *assistants* to humans, but at least so far, they are not >>> *replacements* for humans. >>> >>> --- David A. Wheeler >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/metamath/673178d9-6b7f-4975-9256-8d90fee3839en%40googlegroups.com.
