On Sunday, 8 July 2012 at 21:22:39 UTC, Roman D. Boiko wrote:
On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote:
It's been too long since I was actively working on parsers to give any details, but it is my understanding that because a hand-written parser is optimized for a specific grammar, it's going to be faster.

My aim is to find out any potential bottlenecks and ensure that those are possible to get rid of. So, let's try.

I believe it would not hurt generality or quality of a parser generator if it contained sews for inserting custom (optimized) code where necessary, including those needed to take advantage of some particular aspects of D grammar. Thus I claim that optimization for D grammar is possible.

I'm convinced that the output of a parser generator (PG) can be very nearly as fast as hand-written code. ANTLR's output (last I checked) was not ideal, but the one I planned to make (a few years ago) would have produced faster code.

By default the PG's output will not be the speed of hand-written code, but the user can optimize it. Assuming an ANTLR-like PG, the user can inspect the original output looking for inefficient lookahead, or cases where the parser looks for rare cases before common cases, and then improve the grammar and insert ... I forget all the ANTLR terminology ... syntactic predicates or whatever, to optimize the parser.

So far discussion goes in favor of LL(*) parser like ANTLR, which is top-down recursive-descent. Either Pegged will be optimized with LL(*) algorithms, or a new parser generator created.

Right, for instance I am interested in writing a top-down PG because I understand them better and prefer the top-down approach due to its flexibility (semantic actions, allowing custom code) and understandability (the user can realistically understand the output; in fact readability would be a specific goal of mine)

Roman, regarding what you were saying to me earlier:
In stage 2 you have only performed some basic analysis, like, e.g., matched braces to define some hierarchy. This means that at the time when you find a missing brace, for example, you cannot tell anything more than that braces don't match.

Stage 2 actually can tell more than just "a brace is missing somewhere". Because so many languages are C-like. So given this situation:

   frob (c &% x)
      blip # gom;
   }

It doesn't need to know what language this is to tell where the brace belongs. Even in a more nebulous case like:

   frob (c &% x) bar @ lic
      blip # gom;
   }

probably the brace belongs at the end of the first line.

Perhaps your point is that there are situations where a parser that knows the "entire" grammar could make a better guess about where the missing brace/paren belongs. That's certainly true.

However, just because it can guess better, doesn't mean it can reinterpret the code based on that guess. I mean, I don't see any way to "back up" a parser by an arbitrary amount. A hypothetical stage 2 would probably be hand-written and could realistically back up and insert a brace/paren anywhere that the heuristics dictate, because it is producing a simple data structure and it doesn't need to do any semantic actions as it parses. A "full" parser, on the other hand, has done a lot of work that it can't undo, so the best it can do is report to the user "line 54: error: brace mismatch; did you forget a brace on line 13?" The heuristic is still helpful, but it has already parsed lines 13 to 54 in the wrong context (and, in some cases, has already split out a series of error messages that are unrelated to the user's actual mistake).

As I demonstrated in some examples, it could get the output which implies incorrect structure

I was unable to find the examples you refer to... this thread's getting a little unweildy :)

Reply via email to