Also, *the verifier does not check that a proof only uses its own essential
hypotheses.* This allows incorrect proofs such as $[ set.mm $] evil $p |-
ph $= ( idi.1 ) B $. which turn hypotheses into conclusions.

On Tue, Sep 16, 2025 at 3:44 PM Matthew House <[email protected]>
wrote:

> A few issues I spotted, many of which are nitpicks and some of which are
> more serious:
>
>    - The verifier does not check that each file only uses printable
>    ASCII + "\t\n\f\r". This can be problematic if not checked, since
>    differences in how characters outside this set are handled (e.g., what
>    counts as white space) can lead to discrepancies between verifiers.
>    - The verifier does not check that keyword tokens are separated by
>    white space, e.g., it accepts $($) as a valid comment, or foo$a|- bar$.
>    as a valid $a statement.
>    - The verifier does not check that comments do not contain $( or $).
>    - The verifier does not check that ${ $} delimiters for blocks are
>    balanced within each file.
>    - The verifier does not check that math symbols do not contain $.
>    - The verifier accepts a meaningless dangling $ character at the end
>    of a file.
>    - The verifier does not check that active constants and variables
>    cannot be redeclared, nor that variables cannot be declared as constants or
>    vice versa. It has no support for variables becoming inactive at the end of
>    their containing block.
>    - The verifier does not check that math symbols in $d statements are
>    distinct active variables.
>    - The verifier does not check that all labels are unique, nor that all
>    labels are distinct from all declared math symbols.
>    - The verifier does not check that typecodes are active constants.
>    - The verifier does not check that $f statements use an active
>    variable, nor that all variables in an $e, $a, or $p statement have an
>    active $f statement.
>    - The verifier does not check that a variable has at most one active $f
>    statement at a time, nor that every $f statement for a variable has
>    the same typecode.
>    - The verifier does not ignore repeated inclusions of the same file.
>    For instance, a file test.mm containing $[ test.mm $] will lead to a
>    runaway memory leak.
>    - The verifier does not ignore comments in certain contexts in
>    statements. For instance, it will interpret foo $a $( baz $) |- bar $.
>    as a statement with typecode $( and expression $( baz $) |- bar, which
>    is nonsense and will lead to discrepancies between verifiers.
>    - The verifier does not check for integer overflow in compressed-label
>    steps. This allows multiple ways to write step numbers, as well as ways to
>    write the special code Z as a number, which will lead to discrepancies
>    between verifiers.
>    - The verifier has no support for unknown steps ?. After reading a
>    proof with a ? step, a typical verifier will continue verifying the
>    rest of the database as usual, but warn the user that it
>    contains incomplete proofs.
>    - *The verifier does not check that a proof cannot use its own label
>    in a step.* This allows incorrect proofs such as $[ set.mm $] evil $p
>    |- ph $= ( evil ) AB $. which insist upon themselves.
>    - *The verifier does not check that typecodes match when substituting
>    variables.* This allows incorrect proofs such as $[ set.mm $] evil $p
>    |- ph $= ( vx wn weq wal equid ax-gen ax6 pm2.24ii ) BCZJDZBEAKBJFGBJHI $.
>    which exploit syntax ambiguities.
>
> Writing a correct Metamath verifier requires very careful attention to
> detail, which it seems that LLMs still struggle with, short of directed
> hand-holding or extraordinarily thorough test suites. The original
> mmverify.py code is quite compact for how many details it captures.
>
> Matthew House
>
> On Tue, Sep 16, 2025 at 5:03 AM Zarathustra Goertzel <[email protected]>
> wrote:
>
>> I was curious if Codex could write a Metamath verifier, so asked ChatGPT
>> for a list of verifiers and what languages it thought would be both *easy
>> and interesting*: it chose Go.
>>
>> I started from my mmverify.py repo (where iIhave a too-slow verifier in
>> another weird language, MeTTa <https://metta-lang.dev/>), so it had
>> access to reference implementations.
>>
>> The task was completed pretty easily (perhaps in 40-60min of AI-work)
>> with my feeding in some errors.  It seems to pass David Wheeler's 
>> Metamath-test
>> suite <https://github.com/zariuq/metamath-test>.
>>
>> It's here if anyone's curious:
>> https://github.com/zariuq/mmverify.py/tree/mmverifyGo/go
>>
>> FWIW, I decided to try to see if it could do Metamath Zero, and I must
>> say it struggles (even to just do .mm0 + .mmu verification).
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Metamath" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/metamath/e55ea1ba-b3c2-4018-80a1-a46149ead54fn%40googlegroups.com
>> <https://groups.google.com/d/msgid/metamath/e55ea1ba-b3c2-4018-80a1-a46149ead54fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/metamath/CADBQv9bxtqLqC2j%3D_7N3%3D95TZ2y5%2BkD_mLgEUhjPXZOTkFNc0Q%40mail.gmail.com.

Reply via email to