Hei John,

Just reviewing your latest set of changes on the arboriculture branch of 
your parslet repository. What you basically propose is a different 
approach to what errors matter and how they should be displayed. Let me 
try to explain to you your own approach and see if I get your idea 
straight. Only then will I try to do a critique of it and then maybe we 
can find convergence.

You use the concept of deepest error, which is the error that happened 
at the parse position that was most advanced in the source file. The 
changes you propose would completely remove the stack-trace like 
error-trees and replace them with just one error message that is 
associated with that deepest parse. It would also be mostly associated 
with the most concrete parse at that position: the error would not say 
that a high level rule failed, but that a rule like match() or str() failed.

If the above doesn't capture your idea, consider what is below 
irrelevant to discussion. Feel free to set me right.

This approach seems to work well with the grammar you use. Have you 
thought about how this generalizes? It seems really easy to construct a 
pathological grammar where the deepest error carries no meaning to the 
user of your language.

How does the grammar writer know how the deepest error relates to the 
grammar? What should I fiddle with if I know the input is correct but 
the grammar is not? It seems that we have two set of needs here. As a 
grammar writer I want to know how my grammar failed to parse X; as a 
writer of X I might indeed just want to know about one position to 
twiddle. The error tree anchors the errors back into the structure of 
the grammar; but it leaves the problem of what to display to the user 
(writer of X) completely unsolved. I know I've gone half way only and 
solved my own problem there. Finally somebody notices.

Another concern I've been having (that you probably didn't think of 
here) is the time parslet is spending in the management of all those 
error objects. Even with efficient GC, constructing all those objects 
takes a lot of time when we probably don't need half of them. Your 
approach doesn't address the problem, it just filters what to keep 
differently.

I am thinking: could we do a first parse for getting just results, and 
once that fails, do a second parse that constructs error information 
using a kind of aggregator? Aggregation could then implement either of 
our ideas about how errors should look like... We might be winning on 
more than one front at once. How does that sound?

We'd finally be comparing different kinds of apples when benchmarking 
against Treetop, at least...

I will now try to hack your grammar to produce better error messages, 
without changing parslet. Just because I think this might be doable ;) 
I'll report back.

regards, kaspar

Reply via email to