Re: [bitc-dev] best algorithms / libraries for parsing ambiguous grammars

Jonathan S. Shapiro Tue, 14 Oct 2014 14:58:46 -0700

On Tue, Oct 14, 2014 at 2:38 PM, Geoffrey Irving <[email protected]> wrote:


> Perfect, that answers even the more detailed form of my question.  Our
> grammar will be intentionally ambiguous since we'll be trying to
> detect and fix broken code.  E.g., we want to do things like parsing
> f(x,y) as both applying f to two arguments and applying f to a single
> tuple, then disambiguating using types later on.
>
> Thus, GLR seems good, and the IBM Eclipse parser seems like a
> wonderful place to start.


You may be saying incompatible things here, I'm not quite sure. It depends
on what you mean by "broken".

If "broken" means "it'll be one of several defined parses, but I don't know
which one, and I may want to look at it in several ways", then GLR or
something like it would be your first go-to solution. GLR will actually
hand you back all possible parse trees for a given input. The intention is
to run a resolving pass later, preserving the separation between parsing
and semantics. But you're free to use the multiple trees directly.

GLR in and of itself may not help if you need to deal with input that is
actually malformed. The way to think about that problem is that you're
going to get out a sequence that is either (a) a set of valid parse trees
in the style of GLR, or (b) a set of invalid parses, each of which consists
of a sequence of parse sub-tree *sets *(due to possible ambiguities), some
of which may consist solely of the distinguished "unrecognized token" parse
subtree that serves as a placeholder for text that can't be assembled into
any larger unit

The thing you may want to look at in Eclipse is known as the CDT parser.
Have a look at the talk slides here:

http://wiki.eclipse.org/images/e/ec/McMaster_2012_invited_talk_cdt.pdf

Pay particular attention starting at slide 38, where he starts talking
about incomplete user input (which is to say: malformed input). Then have a
look at the five-part series on building a CDT-based editor:

http://www.ibm.com/developerworks/views/opensource/libraryview.jsp?search_by=CDT+based+editor


These days I wonder if GLR wouldn't be a better framework, mainly because
it tolerates ambiguities and proceeds bottom up. CDT is top-down, so there
are various kinds of errors at the beginnings of files that can make it
difficult to get any partial progress. I haven't played with it enough to
have a sense of how well they do on that sort of thing. The "Completion
Node" idea may give them enough to get past that sort of thing in practice.


Jonathan

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] best algorithms / libraries for parsing ambiguous grammars

Reply via email to