On Tue, Oct 14, 2014 at 2:38 PM, Geoffrey Irving <[email protected]> wrote:
> Perfect, that answers even the more detailed form of my question. Our > grammar will be intentionally ambiguous since we'll be trying to > detect and fix broken code. E.g., we want to do things like parsing > f(x,y) as both applying f to two arguments and applying f to a single > tuple, then disambiguating using types later on. > > Thus, GLR seems good, and the IBM Eclipse parser seems like a > wonderful place to start. You may be saying incompatible things here, I'm not quite sure. It depends on what you mean by "broken". If "broken" means "it'll be one of several defined parses, but I don't know which one, and I may want to look at it in several ways", then GLR or something like it would be your first go-to solution. GLR will actually hand you back all possible parse trees for a given input. The intention is to run a resolving pass later, preserving the separation between parsing and semantics. But you're free to use the multiple trees directly. GLR in and of itself may not help if you need to deal with input that is actually malformed. The way to think about that problem is that you're going to get out a sequence that is either (a) a set of valid parse trees in the style of GLR, or (b) a set of invalid parses, each of which consists of a sequence of parse sub-tree *sets *(due to possible ambiguities), some of which may consist solely of the distinguished "unrecognized token" parse subtree that serves as a placeholder for text that can't be assembled into any larger unit The thing you may want to look at in Eclipse is known as the CDT parser. Have a look at the talk slides here: http://wiki.eclipse.org/images/e/ec/McMaster_2012_invited_talk_cdt.pdf Pay particular attention starting at slide 38, where he starts talking about incomplete user input (which is to say: malformed input). Then have a look at the five-part series on building a CDT-based editor: http://www.ibm.com/developerworks/views/opensource/libraryview.jsp?search_by=CDT+based+editor These days I wonder if GLR wouldn't be a better framework, mainly because it tolerates ambiguities and proceeds bottom up. CDT is top-down, so there are various kinds of errors at the beginnings of files that can make it difficult to get any partial progress. I haven't played with it enough to have a sense of how well they do on that sort of thing. The "Completion Node" idea may give them enough to get past that sort of thing in practice. Jonathan
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
