On Sunday, 8 July 2012 at 01:15:31 UTC, David Piepgrass wrote:
I'm not seeing any tremendous error-handling difficulty in my idea. Anyway, I missed the part about information being thrown away...?

(I will use the word "you" to refer to some abstract person or compiler.)

Error handling could mean either error reporting and stopping after the first error, or reporting several errors and continuing parsing + semantic analysis whenever possible, so that the user would get partial functionality like code completion, information about method overloads, or "go to definition" / find all usages, etc.

The first method is the less powerful option, but still requires constructing a meaningful error message which could help the user.

There are many possible error recovery strategies. For example, you might decide to insert some token according to the parsing information available up to the moment when error is discovered, if that would fix the problem and allow to continue parsing. Another option is to ignore a series of further tokens (treat them a whitespace, for example), until parser is able to continue its work. Also there are many heuristics for typical error situations.

All of these can only be performed if parser gathers the syntax information which can be inferred from source code according to the grammar rules.

In stage 2 you have only performed some basic analysis, like, e.g., matched braces to define some hierarchy. This means that at the time when you find a missing brace, for example, you cannot tell anything more than that braces don't match. Or, if the user inserts a brace in an incorrect location, it is only possible to say that it is either missing somewhere and somewhere else another brace is missing, or that it is missing in one place, and some other brace is redundant. In many cases you won't even notice that a brace is incorrectly placed, and pass the resulting tree to the 3rd stage. You don't take any hint from grammar about the exact locations where some token should be present.

Now, stage 3 heavily depends on the output of stage 2. As I demonstrated in some examples, it could get the output which implies incorrect structure, even if that has not been found in the previous stage. You would need to analyze so much information attempting to find the real roots of a problem, that effectively it would involve duplicating (implicitly) the logic of previous stage, but with combinatorial complexity growth.

The problems you would need to deal with are much more complex than I described here. So you wouldn't be able to deliver error recovery at all, or (if you could) it would be either trivial or would require needlessly complex logic. Breaking the system at the wrong boundary (when the number of dependencies that cross the boundary is large) always introduces artificial complexity.

Described above is the illustration of what I call information loss. I may have described something not as clearly as needed, but I didn't have the goal to provide a thorough and verifiable analysis. I speculated and simplified a lot. If you decide to ignore this, it is not a problem and I don't state that you will fail any of your goals. Just admit that __for me__ this approach is not a viable option. Everything written above is IMHO and there may be many ways to resolve the problems with various degrees of success.

Reply via email to