Hi Glauco,

Apologies for the delay!! I'll try to answer inline

I read why you didn't include metamath; but to let me understand your paper
in the metamath "environment":

- to keep things simple, assume we had a set.mm with all proofs in normal
mode (no compression)

- assume you use symbols + labels + metamath reserved keywords  (in set.mm)
as your tokens (no BPE applied)

*Ok, with you, this is most similar to word-encoding *

- assume you got a transformer T from your genetic optimizer (on
hyperparameters), at generation i

- then, do you fit T's parameters on (let's say) 90% of the proofs in set.mm
 ?

- then, do you measure "learnability" by measuring how good it is at
predicting the next token in the 10% out of sample?

*Yep, the validation loss is how well it predicts tokens on token sequences
it wasn't trained on.*

- would it mean, either the next token in a wff or the next label in a
proof?

*It depends on how we created the train/validation token sequences (but in
general can be anything t+1 token given {...., t} tokens)*

Is this roughly how it would go in the metamath framework?

I'm trying to look at it from a Proof Assistant standpoint, where may be
the most intriguing part would be to "generate" the next missing label in a
proof (where "next" could be "one" of the missing labels)

*Would you or a group of folks be interested in collaborating on a
follow-up paper to use this framework to build a mm GPT?*


On Wed, Feb 28, 2024 at 4:52 PM Glauco <[email protected]> wrote:

> Hi Johnathan,
>
> I read why you didn't include metamath; but to let me understand your
> paper in the metamath "environment":
>
> - to keep things simple, assume we had a set.mm with all proofs in normal
> mode (no compression)
>
> - assume you use symbols + labels + metamath reserved keywords  (in set.mm)
> as your tokens (no BPE applied)
>
> - assume you got a transformer T from your genetic optimizer (on
> hyperparameters), at generation i
>
> - then, do you fit T's parameters on (let's say) 90% of the proofs in
> set.mm ?
>
> - then, do you measure "learnability" by measuring how good it is at
> predicting the next token in the 10% out of sample?
>
> - would it mean, either the next token in a wff or the next label in a
> proof?
>
> Is this roughly how it would go in the metamath framework?
>
> I'm trying to look at it from a Proof Assistant standpoint, where may be
> the most intriguing part would be to "generate" the next missing label in a
> proof (where "next" could be "one" of the missing labels)
>
>
> Thanks in advance for your help
> Glauco
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Metamath" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/metamath/9cc05019-dc5b-4242-9bf6-7af5406b6837n%40googlegroups.com
> <https://groups.google.com/d/msgid/metamath/9cc05019-dc5b-4242-9bf6-7af5406b6837n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/CAHvyvqrzsHX1XRySaF6JhCweTuSQz2nPLMzn3KEJOneRNkpAVw%40mail.gmail.com.

Reply via email to