[PEG] Parser Performance Benchmarks

David Mercer Thu, 19 Oct 2006 15:50:16 -0700

Was reading Terence's web-log entry of today
(http://www.antlr.org/blog/antlr3/lookahead.tml), which contained the
following paragraph:


> The extra power of GLR and PEGs comes at the cost of time and space.
> Grimm's experiments show SDF2 (traditional GLR) and Elkhound are roughly
> 7 times slower than a more traditional LL(k) parser generator such as
> JavaCC at parsing Java source code. Rats! is currently about 2 times
> slower than JavaCC. The extra machinery beyond the LR and LL foundations
> slows down parsing for even deterministic languages like Java.

Has anyone come up with a series of benchmark tests for evaluating parsers.
For instance, a set of grammars and language examples which are used in
determining which parsing methods are faster with different kinds of inputs.
Am thinking we might have examples of both programming languages (C, Java)
and data languages (XML, HTML, X12).  Then we would be able to have analyses
like, "for simple, well-behaved languages like A and B, parser X performs
better; but when the language becomes more complex, such as languages C and
D, parser Y performs better."  Then we could compare things like not only
speed, but memory usage, and possibly even compile them all on a single
platform and have a competition.

If not, what examples do you use to test your parser?  How do you measure
speed and memory usage (if you do measure them)?  How do you check
correctness?

Would help in the decision-making in deciding which parsing model to use,
for instance.  (E.g., should I use ANTLR or Rats! as a base for my
application?)  For that matter, a nice theoretical performance comparison
would be helpful, too.  For instance, I seem to recall Earley's algorithm
has time complexity of about O(n^3) where n is the size of the input.  Not
sure what it's space complexity was.  I think LL(k) is good ol' O(n).  Not
sure what others are.  Am thinking a table which shows preprocessing
(constructing parse tables, etc.) and parsing space and time complexities as
functions of grammar and input sizes.  Here's my beginning:

                     Preprocessing       Parsing
Parse Method         Space    Time    Space    Time
------------------  -------  ------  -------  ------  
Earley                 ?        ?       ?     O(n^3)
LL(1)                  ?        ?       ?     O(n)
PEG (backtracking)     ?        ?       ?       ?
Rats!                  ?        ?       ?     O(n)
ANTLR...
. . .

Cheers,

David Mercer


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

[PEG] Parser Performance Benchmarks

Reply via email to