Hei John, Just reviewing your latest set of changes on the arboriculture branch of your parslet repository. What you basically propose is a different approach to what errors matter and how they should be displayed. Let me try to explain to you your own approach and see if I get your idea straight. Only then will I try to do a critique of it and then maybe we can find convergence.
You use the concept of deepest error, which is the error that happened at the parse position that was most advanced in the source file. The changes you propose would completely remove the stack-trace like error-trees and replace them with just one error message that is associated with that deepest parse. It would also be mostly associated with the most concrete parse at that position: the error would not say that a high level rule failed, but that a rule like match() or str() failed. If the above doesn't capture your idea, consider what is below irrelevant to discussion. Feel free to set me right. This approach seems to work well with the grammar you use. Have you thought about how this generalizes? It seems really easy to construct a pathological grammar where the deepest error carries no meaning to the user of your language. How does the grammar writer know how the deepest error relates to the grammar? What should I fiddle with if I know the input is correct but the grammar is not? It seems that we have two set of needs here. As a grammar writer I want to know how my grammar failed to parse X; as a writer of X I might indeed just want to know about one position to twiddle. The error tree anchors the errors back into the structure of the grammar; but it leaves the problem of what to display to the user (writer of X) completely unsolved. I know I've gone half way only and solved my own problem there. Finally somebody notices. Another concern I've been having (that you probably didn't think of here) is the time parslet is spending in the management of all those error objects. Even with efficient GC, constructing all those objects takes a lot of time when we probably don't need half of them. Your approach doesn't address the problem, it just filters what to keep differently. I am thinking: could we do a first parse for getting just results, and once that fails, do a second parse that constructs error information using a kind of aggregator? Aggregation could then implement either of our ideas about how errors should look like... We might be winning on more than one front at once. How does that sound? We'd finally be comparing different kinds of apples when benchmarking against Treetop, at least... I will now try to hack your grammar to produce better error messages, without changing parslet. Just because I think this might be doable ;) I'll report back. regards, kaspar
