Re: Let's stop parser Hell

Roman D. Boiko Sat, 07 Jul 2012 23:05:37 -0700

On Sunday, 8 July 2012 at 01:15:31 UTC, David Piepgrass wrote:

I'm not seeing any tremendous error-handling difficulty in myidea. Anyway, I missed the part about information being thrownaway...?

(I will use the word "you" to refer to some abstract person orcompiler.)

Error handling could mean either error reporting and stoppingafter the first error, or reporting several errors and continuingparsing + semantic analysis whenever possible, so that the userwould get partial functionality like code completion, informationabout method overloads, or "go to definition" / find all usages,etc.

The first method is the less powerful option, but still requiresconstructing a meaningful error message which could help the user.

There are many possible error recovery strategies. For example,you might decide to insert some token according to the parsinginformation available up to the moment when error is discovered,if that would fix the problem and allow to continue parsing.Another option is to ignore a series of further tokens (treatthem a whitespace, for example), until parser is able to continueits work. Also there are many heuristics for typical errorsituations.

All of these can only be performed if parser gathers the syntaxinformation which can be inferred from source code according tothe grammar rules.

In stage 2 you have only performed some basic analysis, like,e.g., matched braces to define some hierarchy. This means that atthe time when you find a missing brace, for example, you cannottell anything more than that braces don't match. Or, if the userinserts a brace in an incorrect location, it is only possible tosay that it is either missing somewhere and somewhere elseanother brace is missing, or that it is missing in one place, andsome other brace is redundant. In many cases you won't evennotice that a brace is incorrectly placed, and pass the resultingtree to the 3rd stage. You don't take any hint from grammar aboutthe exact locations where some token should be present.

Now, stage 3 heavily depends on the output of stage 2. As Idemonstrated in some examples, it could get the output whichimplies incorrect structure, even if that has not been found inthe previous stage. You would need to analyze so much informationattempting to find the real roots of a problem, that effectivelyit would involve duplicating (implicitly) the logic of previousstage, but with combinatorial complexity growth.

The problems you would need to deal with are much more complexthan I described here. So you wouldn't be able to deliver errorrecovery at all, or (if you could) it would be either trivial orwould require needlessly complex logic. Breaking the system atthe wrong boundary (when the number of dependencies that crossthe boundary is large) always introduces artificial complexity.

Described above is the illustration of what I call informationloss. I may have described something not as clearly as needed,but I didn't have the goal to provide a thorough and verifiableanalysis. I speculated and simplified a lot. If you decide toignore this, it is not a problem and I don't state that you willfail any of your goals. Just admit that __for me__ this approachis not a viable option. Everything written above is IMHO andthere may be many ways to resolve the problems with variousdegrees of success.

Re: Let's stop parser Hell

Reply via email to