Re: regular expressions and parrot

Henrik Gulbrandsen Mon, 18 Apr 2005 18:53:47 -0700

> On Sun, Apr 17, 2005 at 04:33:57PM +0200, BÁRTHÁZI András wrote:
> > Just a short question I'm interested in: where will be, and how will 
> > work (I just asking for a general description about it) the regular 
> > expression / rules part of Parrot?

On Sun, 2005-04-17 at 09:38 -0500, Patrick R. Michaud wrote:
> The regular expression / rules part of Parrot is called "PGE",
> for "Perl/Parrot Grammar Engine", and it's currently in the compilers/pge
> directory.  The intent is that rules will be another compiler within
> Parrot (i.e., it can standalone somewhat outside of Perl).

At the risk of advertising vaporware, I'd like to grab this opportunity
to provide an update on my own work. Those of you with good memory may
recall that I announced a preliminary "Parrot Syntax Engine" on this
list back in the beginning of January. In essence, this gave the API and
a basic implementation of a bottom-up GLR parser for dynamic grammars.
As such, it has some bearing on the "rules" part of Parrot, although the
interface is more generally intended for domain-specific languages where
the language syntax is allowed to change at runtime.

To make a short story short, Leo went directly for my throat by asking
me about performance, which I had to admit was less than satisfactory.
Since then, I've spent quite some time modifying the DParser layer to
allow incremental grammar updates. The time needed to add a 340th rule
to the grammar is now about 1/1000 of the original result, or < 1 ms.

Leo also asked the following, which I never really answered:

On Fri, 2005-01-07 at 11:10, Leopold Toetsch wrote:
> - How fast is DParser compared to bison/flex?
> - What about memory usage compared to bison/flex?

I have not tested the memory consumption for a fixed grammar, but the
total memory used in building the 339-rule Python grammar was roughly
3.5 MB according to valgrind/massif. That's about 4 KB per LR state,
which should not cause a big problem in the foreseeable future. Note
that a lot of this memory can be released if the grammar is frozen!

In order to test the parsing speed, I extracted the C grammar of gcc and
constructed a yacc/lex and a PSE version of the parser. The idea was to
run both parsers on files from the Linux kernel to get a real-world test
of correctness and speed. As expected, yacc turned out to be faster :-)

Unfortunately, the difference was almost two orders of magnitude, with
DParser taking more than half a second to parse a 2000-line file. I am
not completely happy with this result, so that is my current focus...

>From time to time, life interferes with progress, and I am not at this
point ready to give an expected release date for version 0.2, mostly
because the mysterious factor 3.14 has been creeping into all estimates
I've attempted so far. I just wanted to give the information I have, in
case someone wondered what happened to this little project :-|

/Henrik

Re: regular expressions and parrot

Reply via email to