Re: Let's stop parser Hell

David Piepgrass Sun, 08 Jul 2012 16:40:37 -0700

On Sunday, 8 July 2012 at 21:22:39 UTC, Roman D. Boiko wrote:

On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote:
It's been too long since I was actively working on parsers togive any details, but it is my understanding that because ahand-written parser is optimized for a specific grammar, it'sgoing to be faster.
My aim is to find out any potential bottlenecks and ensure thatthose are possible to get rid of. So, let's try.
I believe it would not hurt generality or quality of a parsergenerator if it contained sews for inserting custom (optimized)code where necessary, including those needed to take advantageof some particular aspects of D grammar. Thus I claim thatoptimization for D grammar is possible.

I'm convinced that the output of a parser generator (PG) can bevery nearly as fast as hand-written code. ANTLR's output (last Ichecked) was not ideal, but the one I planned to make (a fewyears ago) would have produced faster code.

By default the PG's output will not be the speed of hand-writtencode, but the user can optimize it. Assuming an ANTLR-like PG,the user can inspect the original output looking for inefficientlookahead, or cases where the parser looks for rare cases beforecommon cases, and then improve the grammar and insert ... Iforget all the ANTLR terminology ... syntactic predicates orwhatever, to optimize the parser.

So far discussion goes in favor of LL(*) parser like ANTLR,which is top-down recursive-descent. Either Pegged will beoptimized with LL(*) algorithms, or a new parser generatorcreated.

Right, for instance I am interested in writing a top-down PGbecause I understand them better and prefer the top-down approachdue to its flexibility (semantic actions, allowing custom code)and understandability (the user can realistically understand theoutput; in fact readability would be a specific goal of mine)


Roman, regarding what you were saying to me earlier:

In stage 2 you have only performed some basic analysis, like,e.g., matched braces to define some hierarchy. This means thatat the time when you find a missing brace, for example, youcannot tell anything more than that braces don't match.

Stage 2 actually can tell more than just "a brace is missingsomewhere". Because so many languages are C-like. So given thissituation:


   frob (c &% x)
      blip # gom;
   }

It doesn't need to know what language this is to tell where thebrace belongs. Even in a more nebulous case like:


   frob (c &% x) bar @ lic
      blip # gom;
   }

probably the brace belongs at the end of the first line.

Perhaps your point is that there are situations where a parserthat knows the "entire" grammar could make a better guess aboutwhere the missing brace/paren belongs. That's certainly true.

However, just because it can guess better, doesn't mean it canreinterpret the code based on that guess. I mean, I don't see anyway to "back up" a parser by an arbitrary amount. A hypotheticalstage 2 would probably be hand-written and could realisticallyback up and insert a brace/paren anywhere that the heuristicsdictate, because it is producing a simple data structure and itdoesn't need to do any semantic actions as it parses. A "full"parser, on the other hand, has done a lot of work that it can'tundo, so the best it can do is report to the user "line 54:error: brace mismatch; did you forget a brace on line 13?" Theheuristic is still helpful, but it has already parsed lines 13 to54 in the wrong context (and, in some cases, has already splitout a series of error messages that are unrelated to the user'sactual mistake).

As I demonstrated in some examples, it could get the outputwhich implies incorrect structure

I was unable to find the examples you refer to... this thread'sgetting a little unweildy :)

Re: Let's stop parser Hell

Reply via email to