Re: [Metamath] Re: About Metamath zero (mm0)

Mario Carneiro Sun, 23 Feb 2020 04:56:45 -0800

On Sun, Feb 23, 2020 at 2:14 AM Olivier Binda <[email protected]> wrote:


> Hi Mario
> Many thanks for the answer
>
>>
>> You probably know this already, but I've been focusing my user interface
>> stuff on the .mm1 file format. The mmu file format currently doesn't even
>> have a syntax highlighter, although it could without too much trouble (it's
>> just an s-expression). But don't make users write mmu files, that's not
>> nice. That's like asking people to write metamath proofs (the contents of
>> $p $= proof... $.) by hand.
>>
>>
> Yes. I'll remove the part about editing/looking at mmu files and
> concentrate on mm0 (I experimented first on mmu because the grammar is
> easier)
>
> Before I remove it, could you please have a look at the  mmu lexer rules
> <https://github.com/Lakedaemon/mm0-kotlin-tooling/blob/master/intellijIdeaPlugin/src/main/kotlin/org/lakedaemon/mm0/plugin/mmuLexer.flex>
>  and
> the mmu primary parser bnf
> <https://github.com/Lakedaemon/mm0-kotlin-tooling/blob/master/intellijIdeaPlugin/src/main/kotlin/org/lakedaemon/mm0/plugin/MMU.bnf>
>
> And use those to update the specs of mmu in the official mm0 repository,
> please.
> That way, it'll be easier for other developers to adopt mm0 and build
> their own tools
>

Yep, there is a reason https://github.com/digama0/mm0/issues/13 is still
open. :)

The mmu lexer is good (the whitespace rule can be simplified to just [
/n]+).

In the parser, the rule for term-binder is:

term-binder ::= LEFT_PARENTHESIS var-name sort-name LEFT_PARENTHESIS
identifier* RIGHT_PARENTHESIS RIGHT_PARENTHESIS

but the third argument is optional (and is used to distinguish bound and
regular variables). So it should read

term-binder ::= LEFT_PARENTHESIS var-name sort-name (LEFT_PARENTHESIS
identifier* RIGHT_PARENTHESIS)? RIGHT_PARENTHESIS

(Edit: I see you've called this "theorem-binder". These two nonterminals
can be merged.)

For input and output statements, the grammar is

input-stmt ::= 'input' input-kind expr*
input-kind ::= ID
output-stmt ::= 'output' output-kind expr*
output-kind ::= ID

with the constraint on valid input and output kinds in the later processing
rather than in the parser. The supported input and output kinds are
implementation defined, and I would assume you don't need to support any.
Currently mm0-hs supports the 'string' input and output kinds, and mm0-rs
supports no input or output kinds.

Your rule for identifiers:

identifier ::= ID | SORT | TERM

refers to lexer classes SORT and TERM that I can't find defined. I think
this can just be "identifier ::= ID".

For theorem visibility, I think I may have gone back and forth on whether
to use "local theorem" and "theorem", or "theorem" and "pub theorem", but
in the latest version the mmu format only sees "local theorem" and
"theorem", so the "pub" keyword is not used (although it is used in mm1
files). In any case, the semantics is that a (pub) theorem is one that
bumps the mm0 cursor to the next statement and checks that the next
statement is a theorem that matches this one in the mmu file, while a local
theorem is one that does not appear in the mm0 file and so has no effect on
the current position in the file.

expr ::= var-name | term-name | LEFT_PARENTHESIS term-name (expr)*
RIGHT_PARENTHESIS

The "| term-name" part is not necessary here. (It is necessary in the mm0
grammar, but in mmu the parentheses around nullary term constructors are
mandatory.)

proof-expr ::= ... |  LEFT_PARENTHESIS assert-name (identifier)*
RIGHT_PARENTHESIS

This production is not necessary. All theorem applications have the form (T
(es) ps), so even if es and ps is empty it still looks like (T ()).

convertibilityProof ::= ... | LEFT_PARENTHESIS ':unfold' identifier expr
LEFT_PARENTHESIS (identifier)* RIGHT_PARENTHESIS convertibilityProof
RIGHT_PARENTHESIS

the "expr" in this should be a list of expressions, that is,
"LEFT_PARENTHESIS (expr)* RIGHT_PARENTHESIS". The first one in the list is
not applied to the rest, which TBH makes it look a bit odd; you get things
like

(:conv an (ph ps) () (not (im ph (not ps))))

indicating that (an ph ps) unfolds to (not (im ph (not ps))), and the
parentheses around (ph ps) look like they are in the wrong place. What are
your thoughts on changing the syntax to

convertibilityProof ::= ... | '(' ':unfold' '(' identifier (expr)* ')' '('
(identifier)* ')' convertibilityProof ')'

where the parentheses around the expression are moved to before the target
term? That way the same proof looks like

(:conv (an ph ps) () (not (im ph (not ps))))

which makes a bit more visual sense. We could do the same thing with
proofs, moving the opening paren back so that T is at the head of the list,
but then they would have the form ((T es) ps) with a double paren at the
beginning. This is fine for parsing purposes, and it's still a valid
s-expression, but it does make the form a bit unusual.

That's what it was designed for. (If you check the marketplace / plugin
>> database, I will bet you can already find a LSP client implementation for
>> intelliJ,
>>
>
> Good, it'll make things easier. I am just concerned about one bit :
> the plugin will probably have to ship native binaries (for multiple
> platforms). So, it'll be at best more cumbersome that a
> java/kotlin/groovy/scala solution that runs on the jvm already shipped with
> intellij idea.
> But if it is doable, I'll do it.
>
>>
>> Yep, in my mind this is already a solved problem.
>>
>
> I knew it. :)
>
> You are a bit like Fermat ("I have just solved a very important problem
> for mankind") except
> - he wrote short esoteric blog posts in the margin of his proofs while you
> write short proofs in your esoteric blog posts
> - that we can wring answers out of you and implement those solutions in a
> matter of days instead of centuries, which is definately nice. ;)
>
> Imports aren't in the mm0 grammar because raw mm0 should not get that
>> large. The set.mm translation is supposed to be passed an -f argument to
>> specify what theorem or theorems in set.mm you care about, and those go
>> into the mm0 file.
>>
>
> Mmmh. This is nice
>
>
>> Even for very advanced theorems the resulting mm0 files are only a few
>> hundred KB. The mmb file is much larger, but it is not meant to be edited
>> by a text editor anyway. The mmu file is even larger because it is a
>> textual version of the mmb file, but it is also not meant to be edited by a
>> text editor, and it's rather overwhelming for non-toy problems.
>>
>
> yes.  The only people who will want to have a look at mmu/mmb files are
> probably software developers that want to implement mm0 backed features.
> The rest of humanity don't need to look inside that. It just needs to work
> (as excpected).
>

Right. The core of the architecture is that rather than looking at the
proofs directly, you have a program to do the looking for you and look at
the program instead to make sure it is correct.

I'm not sure I'll ever want to go in the mm1 direction mostly because
>
> no one likes to read piles of s-expressions
>
>
> and selection doesn't seem to look favorably on Lisp-backed languages
> (which might just be a corollary of the first quote)
>
> I try to be open minded but I think It'll take a lot of convinving to get
> me to even open a mm1 file.
>

The use of s-expressions in mm1 is sugared to make it easier to use, and in
particular to avoid the parentheses explosion you see in mmu files (for
which lisp is known). But more complex languages have more complex parsers,
and I don't know if I can continue to deliver the same quality of syntax
highlighting, error reporting and fast processing with more syntax hurdles.
But syntax questions are largely orthogonal to the rest of the language, so
if you have ideas for a better syntax be my guest. One thing that I think
is certainly coming is a way to configure or replace the secondary parser,
the thing that turns $ math expressions $ into data. So you can write your
own domain specific language and put it in $ math quotes $ to get it parsed
via the current default parser, or put it in "string quotes" and explicitly
call a parsing function of your own. In these places you can have whatever
syntax you like, although you may have to do extra work to get the same
amount of support for hovers and such in the middle of a string.

However, MM1 is not core to the architecture the same way MM0/MMB is. If
you want to make your own front end to compile to MM0, it need not have any
syntax in common with MM0, and it may even only vaguely present MM0
primitives to the front end. I think that MM1 is the best tool for writing
"pure MM0" code, in the same way that C is a good language for writing low
level assembly, but you may want to layer a high level language on top of
it and this language may look nothing at all like MM1. (Actually, MM1 has
rather better metaprogramming facilities than C does, macros are a poor
substitute. Maybe D is a better comparison.)


> Maybee, a solution would be to stip the mm0 files from the import
>>> statement and have a third filetype (.mmerge)
>>> that holds a structured document telling how to build big mm0/mmu files
>>> from smaller parts
>>>
>>
>> It sounds like you are proposing
>>
>
> Actually, I'm not proposing. I mostly want to talk about the issue to know
> what should be best done about it, so that we can reap benefits in the
> future (and not shed tears later)
> - You have pondered a long long time and you did a marvelous job at
> designing mm0, but most of the explanations are still in your mind.
> I need to know the reasoning behind things before I implement stuff, to do
> a good job (and also, because I'm curious and not a robot).
>
- it gives people (including me) the opportunity to speak their minds,
> suggest ideas, give advice and maybe find an even better solution
>

Sure. I am happy to explain these things as necessary, but I need a bit
more to see the ways in which my attempt to solve the import problem misses
the mark.


> (The aggregator is careful to preserve things like formatting when
>> building the mm0 files, so that the output is still readable, yet self
>> contained, because it is the source of truth for the target axiomatization.)
>>
>
> mm0/mmu/mmb files excell at what they where designed for.
>
> Imo, aggregating is an external concern and shouldn't touch mm0/mmu/mmb
> files and it would be nice to allow everyone (including non mm1 believers)
> to do the aggregation themselves.
>

If you are a nonbeliever, then you use some high level proof language, like
say lean, that has nothing whatsoever to do with MM0. A backend tool
processes your proof and spits out an MM0/MMB pair, with the MM0 file
readable enough that the target theorem is recognizable. No aggregation is
necessary at the MM0 level, because the output is a single file (well, two
files). Lots of aggregating might have happened with multiple lean files,
but that's it's problem.

The point of aggregating mm0 files is in order to maintain a readable
collection of theorem statements, in particular if they cover multiple
subject matters and don't necessarily form a linear dependency graph. I'm
currently using this for the definition of the MM0 verifier, which requires
two entirely distinct theories: the definition of MM0 as a formal system,
and the definition of the x86 architecture. It would be awkward to have the
definition of mm0 import x86, or for x86 to import mm0. But the final
theorem requires both, so there is a diamond dependency. Using separate
files and an aggregator allows me to concisely specify all the
downward-closed subposets of this lattice without repetition.


> So, it would be nice to design a very simple and open solution that would
> elegantly handle the issue.
> Also, there is the issue of proof authoring (which is a fundamental part
> of getting mm0 to succeed). I'm not saying that you should solve proof
> authoring too.
>

Well, as I've said MM1 is my answer to proof authoring, and I'm using it
"in production" so to speak. Of course I am only one user, and I have
specific tastes so the language has grown in rather lopsided ways as a
result to support my proof style. More and different opinions are needed to
make it more generally usable.

But again it would be nice if everything was in place so that other people
> could have the basic core features to be able to write authoring tools on
> top of mm0/mmb.
>
> This is quite a lengthy subjects so I'll just throw subjects around in
> later posts but we already spoke of :
> 1) spliting mm0 files in short parts (looks like you are ok with that)
> 2) having custom mm0 files with selected theorems (looks like you already
> did it)
> 3) aggregating mm0 files together (already brushed)
>
> somehow this reminds me of libraries and software development and screams
> package managers. :)
>

Quite so. Package managers have a tendency toward over-engineering, though,
and all the projects I have overseen have tended more toward a "monorepo"
style where the entire ecosystem is globally visible so that sweeping
changes are possible.


> 4) having a standard file format to allow spec/proof conversion between
> theorem provers (You already did that)
>  Yet, you say
> A production quality verifier should probably use a binary file format,
> but the information contained in the file should be about the same
>
> which somewhat implies that you don't intend mmb to become an interchange
> format, as it would be best for the binary formats to get the best of the
> platform/endianness it is target at for performances reasons, I guess
>

Actually, that quote is from early days, before the .mmb format existed. I
am alluding to it. (For context, "about the same" means about the same as
the .mmu file format.) While some minor changes are still needed to the mmb
format to make it more suitable as an interchange format, specifically
regarding more ability to annotate proofs and terms in the index, I *do*
intend it as one. It's true that there are endianness concerns, and for
better or worse I've decided to just write in the spec that integers in mmb
are little endian and let big endian machines deal with it. But verifiers
are allowed to define their own formats if they want in any case, and one
easy and common way to indicate that the endianness is reversed is to see
if the magic number "MM0B" in the first 4 bytes comes out in reverse order.
So if a verifier wanted to declare that it checks MM0B formats in native
endianness, that's fine too, and proof authoring tools would need to keep
that in mind when targeting that verifier. But this sort of thing gets
annoying fast, and MM0B is already basically a bytecode interpreter, so
it's not a huge deal to ask for platform independence.

So, do you intend .mmu files are what will be used to exchange proofs
> between people, worldwide ?
> Say, you exchange .mm0 files (in a contest) and the someone A writes
> proofA on his linux pc, with mephistolus
> he sends the exported .mmu proof to someoneB on his windows/amiga/weird
> endianness box to read the proof and import it in hol (or whatever) ?
>

You could do that. MMB is much better than MMU for large proofs though
because it does more thorough deduplication, in particular between terms.
In the worst case you can get an exponential blowup with large terms in mmu
that are totally fine in mmb.

Mario

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/CAFXXJSuL2SFfcy-iX5N3e2ers8hYVQaKSwHt6ijHAUpF1KGk8w%40mail.gmail.com.

Re: [Metamath] Re: About Metamath zero (mm0)

Reply via email to