Terence Parr <parrt <at> cs.usfca.edu> writes:
>
>
> On Jun 18, 2009, at 6:20 PM, Michaeljohn Clement wrote:
> > I think generating acceptable error messages from a parser alone is an
> > interesting hard problem. It might be possible to do some statistical
> > analysis on a corpus of valid inputs and then derive heuristics to
> > suggestwhat the most likely error in the input string might be.
>
> Has anyone thought of parsing backwards from the end towards the
> detected error location? If you parse forwards and backwards you might
> be able to zoom in on a problem area. Of course if there are lots of
> errors following the first one, it won't help you too much. It's sort
> of what a human does though, isn't it? We look down a few tokens and
> work our way back up to see if we can make sense of things.
>
> The other thing I wondered about. Can we launch a whole bunch of
> threads using multiple core to sniff the input to improve error
> analysis? Maybe we launch parsers at multiple points in the input
> stream and then use the interpretation that yields the fewest errors.
>
> Just random thoughts. Let's use those cores, man! Right now, all they
> do is run Pandora and instant messaging for me. ;)
>
> Ter
>
Parsing backwards is interesting; however, as you've mentioned, one
might have trouble when it comes to having more than one parse
error in the document.
Another idea to get people thinking about might be phrase-level
context-sensitive errors where context sensitivity is achieved by
matching some number of parser frames (each being the application
of a production rule) on the top of the stack against a given pattern
so that if a phrase fails to match it will check some sequence(s) of
production rules against the top frames on the stack and err before
it backtracks to apply the next phrase in the production rule or
cascades a failure.
Another idea, previously mentioned, is a production-rule-level
parse error, where if a production rule fails to match one of its
phrases then it will simply cause a parse error. This is very appealing,
especially for production rules with only one phrase where an error
is only detectable on failure of the production rule. The following
is an example of this:
Type : 'int' : 'float' : 'char' ;
Identifier : !Type ... ;
IdentifierDeclaration
: Type Identifier IdentifierList <semicolon>
;
IdentifierList
: <comma> Identifier IdentifierList
: <>
;
For the input "int foo, bar float;" the parser will go, and when applying
IndentifierList(", bar float;"), it will match the comma and the identifier
and *also* the IdentifierList (by failing on 'float' and backtracking to
match <> (epsilon)). IdentDeclaration will then fail when it fails to
match the semicolon.
This is an annoying case where a parse error is somewhat obscured
by a successful application of a production rule. Really, the parse error
occurs in IdentifierList, but only appears when we fail IdentDeclaration
and subsequently cascade off of the parser stack.
_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg