Hi again,

I didn't want to hijack the thread, so I kept the details on my side as 
scarce as possible. Most is a moving work in progress, so I wouldn't jump 
into discussing the specifics until after I make a couple of repositories 
with demonstrations ready to be public. But I agree the surrounding 
questions are quite interesting. 

On the incrementally improving grammar, that is beyond the scope of Marpa 
and beyond the immediate scope of my current project, but feels like it is 
an inevitable piece in language understanding. Papers bring along a lot of 
"local" notations, so a grammatical approach to model them would need to 
both uncover and recognize said notations. As this hasn't been my main 
focus, I am only familiar with one recent work on the topic by Kevin 
Kofler, which departs from the tables of Earley's algorithm and is instead 
based on runtime graphs that are efficient to update on the fly:

https://www.tigen.org/kevin.kofler/fmathl/dyngenpar/

Given it's a recent PhD thesis, consider that with a grain of salt, 
certainly a problem with lots of open questions to answer.

As to the goal of parsing the arXiv formulas, the most direct application 
is "semantic search" over the formulas. I'll try to give an example that is 
easy to think about. 
I recently had an exchange where someone stated "obviously a blackboard N 
(ℕ) can only mean the set of natural numbers". Since I have the arXiv data 
in machine-readable form I could quickly grep for "let ℕ" and recovered the 
following snippets from actual papers:

1. Linear algebra - a matrix
 - "let ℕ be the matrix given by proposition 2"

2. Category theory
 - "let ℕ be a functor of R modules"
 - "let ℕ be the category of finite sets with inclusions as morphisms"

3. Information theory - coding instance
 - "let ℕ be a secure network coding instance and 𝕀 be the corresponding 
secure index coding instance obtained using the network to index coding 
mapping"

4. Topology - a complex
 - "let ℕ be the complex containing the point x the argument in wet(x,𝕄) 
and let ℕ^' be the other complex"

5. Probability theory - a martingale
 - "let ℕ be a continuous local martingale with values in ..." 

And there are probably others. This is to motivate that likely ALL symbols 
in expressions have a multitude of meanings, when taken to the arXiv's 
scale. Hence the need for an approach capable of dealing with the explosion 
of lexical ambiguity in math syntax. If one could craft rules for all 
scientific subdomains (as well as one "generic" domain so that well-formed 
expressions in unknown/new domains succeed to parse), one could get a 
forest of all coherent parses, while pruning away all senseless readings. 
The parse forest can then be packed and serialized (in e.g. Content MathML) 
for downstream applications to reuse. The most direct one being semantic 
search engines such as MathWebSearch, which can let authors write queries 
that target a symbol's meaning, rather than its syntactic sign:
https://pdfs.semanticscholar.org/a8c1/c3aa77e9c1b80b00512ddfbe2c02abe80ed0.pdf

As an example, quite neatly, you could ask for an expression involving a 
variable known to be a "martingale" (by setting a query variable's type), 
and obtain surprising results in notations one wasn't familiar with (such 
as the blackboard N here).

There are other applications beyond search, e.g. making expressions easy to 
transfer into computer algebra systems, eventually heading towards the 
long-term goal of being able to have a computer assistant walk a paper's 
equations and find obvious contradictions/errors akin to the way a 
spellchecker validates words.

I should try to setup the Discord you referred to, could be interesting to 
co-brainstorm at some point. But I have some work to do on my end before I 
get back to juggling the parse forests.

Greetings,
Deyan


On Saturday, October 26, 2019 at 11:39:41 AM UTC-4, Piotr Czarnecki wrote:
>
> Hi Deyan,
> Could you give an example of a formula and its effect? Is it grammar 
> synthesis from sentences? Could you link papers that implement 
> incrementally improving grammars? This is interesting. I'm sure I've read a 
> paper about grammar synthesis from sentences. I would like to implement 
> this in Rust. What is the use case for this, with arXiv as input, though?
>
> Thanks for encouragment, I highly appreciate it. If you would like to know 
> anything about bocage internals, ping me on Discord or here, I can explain 
> it all. If so, we would need a starting point -- to what extent are you 
> familiar with SPPF and its implementation?
>
> Cheers
> --Piotr
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/marpa-parser/5f920d6e-8b31-4414-899c-2a97fd1c6b31%40googlegroups.com.

Reply via email to