Re: perlcc

h. w. neff Thu, 05 Jun 2003 11:15:59 -0700


"Ron D. Smith" wrote:
> 
> On Wednesday, Jun 4, 2003 "h. w. neff" said:
> > hi.
> > i'm using P::RD and the fully interpreted version of my
> >    resulting tool is incredibly slow.
> > the half-step of precompiling my grammar into a .pm helps
> >    somewhat in the speed department, albiet with different
> >    behaviours.
> > i was really hoping to feed my P::RD based beastie to perlcc
> >    and get something much faster and self contained.
> 
> Well this is *a* solution, but there are others.  With Perl there is always
> More Than One Way To Do It(tm).


of this i am aware and appreciative.

> I wrote a parser for a hardware description language using a hierarchical
> approach and got it to work.  I observed that it was incredibly slow, like
> you are complaining about.

mine's an "assembler" with c-like syntax for a machine with multiple
   execution modules and addressing units.
some of the performance hit doubtless comes from a requirement to
   have clauses with multiple phrases with from zero to five terms
   in any order....well, any order was my idea.
(think VLIW (very long instruction word) where control of individual
   registers occurs on an instruction by instruction basis.)

> I turned on $::RD_TRACE and took a good hard look at the result.  I
> discovered that the parser was spending most of its time pursuing false
> paths, so it was investing a lot of effort but not doing a lot of work.

now that i've got it working, i shall do this sort of examination.

> There are a number of performance "tricks" that might help.  These are
> general, your mileage may vary.
> 
> 1) flatten descriptions.
> 
> Hierarchical definition is easy to define and debug, but its slow.  Instead
> of:
> 
> DEFINITION:
> foo {} |
> bar {} |
> splat {}
> 
> foo:
> /regx/ 'constant' variable(s)
> 
> etc.
> 
> Try this instead:
> 
> DEFINITION:
> /regx/ 'constant' variable(s) {} |
> in-line "bar" definition {} |
> in-line "splat" definition

i need to do some hierarchical compaction.
for the initial go i've elaborated so that every place a foo can
   happen always goes to same one foo production, as you've shown.

> 2) use "look-ahead"
> 
> Using the lookahead feature can significantly shorten things.  For example:
> 
> DEFINITION:
> tkid 'foo' foo_type {} |
> tkid 'bar' bar_type {} |
> tkid 'splat' splat_type {} |
> tkid {}
> 
> Here a tkid can be followed by a list of keywords, or by nothing.  The parser
> spends an enormous time trying them all.
> 
> This can be:
> 
> DEFINITION:
> tkid ...keywords process_keywords {} |
> tkid
> 
> keywords:
> /(foo|bar|splat)\b/
> 
> process_keywords:
> 'foo' foo_type |
> etc.

i have used some lookahead, as well as <commit> where i can.
i've not yet looked into trying to propogate committedness back
   up the hierarchy (usually when i can commit at some level,
   whole subtrees at the upper level can/should be trimmed so
   they're not tried in case of subsequent failure.

> 3) Avoid singleton definitions unless there can be a clear performance
> advantage.  Why add hierarchy if it is not necessary, it just adds additional
> parser calls.

(guess i wonder what such a case might be where there is a performance
    advantage -- documentation/maintenance advantage, yes, but perf?)

> 4) throw away things.
> 
> Suppose that a tkid can be a /\w+/ string, but cannot be 'foo' or 'bar',
> which are keywords.  You can have two tests, which will rarely trigger or you
> can create a greedy check which then fails.
> 
> For example:
> 
> tkid:
> 
> /\w+/
> {
>     if (grep(($id eq $_),@all_keywords)) {
>         return=undef;
>     } else {
>         return=$item[1];
>     }
> }
> 
> So most of the time it will succeed, but in the rare case it fails, you are
> safe.  This does not affect the definition of tkid, but affects the parent.
> To handle the keywords case you would have had to specify them first like
> this:
> 
> DEFINITION:
> 'foo' |
> 'bar' |
> tkid
> 
> So you make two keyword tests for (typically) no purpose.  But "you have to"
> otherwise it will not parse correctly.
> 
> Using the above definition of tkid, you can modify DEFINITION as follows:
> 
> DEFINITION:
> tkid |
> 'foo' |
> 'bar'
> 
> This works because if tkid is either foo|bar the test fails and the parser
> proceeds.

good thought, thanks.
some judicious reordering can probably be done.

> 5) use regular expressions
> 
> By example, instead of this:
> 
> DEFINITION:
> '(' /\d+/ ')' {}
> 
> Do this instead:
> 
> DEFINITION:
> /\(\s*(\d+)\s*\)/ {my $arg=$1; ...}

ok, i know of a place or two where this applies.


> You can use these to create really interesting examples.  As an extreme
> example of in-lining and regular expression usage I offer this:
> 
> declput:
> # this really hairy thing is a performance optimization.
> /(dcoutput|hinput|houtput|input|internal|ioput|linput|loutput|modifies|output|
> produces|uses)\s+((?:(?:boolean|integer|node|real|string|bus|np|ntri|own|pp|rn
> tri|rom|rtri|state|tri|wand|wor)\s+)+)(\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:
> _(?!_)))*)\s*(\[\d+(?::\d+)?\])?\s*([EMAIL PROTECTED](?:(?:[a-zA-Z\d\.\$\#]+)|(
> ?:_(?!_)))*\s*\(\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:_(?!_)))*\s*(=\s*(\d
> +|(\"([^\"]|\"\")*\")))?\s*(,\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:_(?!_))
> )*\s*(=\s*(\d+|(\"([^\"]|\"\")*\")))?\s*)*\)(\s*,\s*\$?[A-Za-z](?:(?:[a-zA-Z\d
> \.\$\#]+)|(?:_(?!_)))*\s*\(\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:_(?!_)))*
> \s*(=\s*(\d+|(\"([^\"]|\"\")*\")))?\s*(,\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)
> |(?:_(?!_)))*\s*(=\s*(\d+|(\"([^\"]|\"\")*\")))?\s*)*\))*)?\s*/
> {my ($p,$tq,$id,$v,$a)=($1,$2,$3,$4,$5);
>  ....

reminds me a bit of the email address parsing example at the
   back of 'mastering regular expressions'. :-)

> Yes, its hairy, but I built it up one piece at a time after I was sure the
> parsing was correct.  At first blush it appears unmaintainable, until you see
> all of the other definitions that went into this using just cut and paste
> (and which are just commented out).  To maintain this, I disable this
> optimization, enable the old definitions, get it right, and then use cut and
> paste again.  It ends up not being as difficult as it appears at first blush.

i can't be quite so extreme: it's my desire to eventually hand off
   the maintenance of the tool to another company, and i don't expect
   many world class regex hackers there.
(then again, the phrase "support fees" springs to mind. :-)
 
> The bottom line is that you need to turn on tracing and see what the parser
> is ACTUALLY DOING.  Figure out where it is wasting its time and alter the
> grammar to "not do that". This will help tremendously.
> 
> Using my tricks improved the parser speed in my case by a factor of at least
> 20 and in some cases by as much as 100.

well, i shall certainly trace it out.
that kind of enhancement would probably be enough. :-)

> > when i tried that, however, i ran into the "did not compile,
> >    which can't happen!" :-/ problem.
> > i asked at perlmonks, with no particular useful resolution
> >    wrt P::RD and perlcc.
> > having been directed here, i see the attached year old query
> >    (below) and no follow up at all.
> >
> > is this a case of "abandon hope all ye who enter" ?
> > hwn

still the sound of one hand clapping.
i guess i'll assume it's just that the damian has buried
   something magikk in P:RD so that perlcc loses its
   cookies thusly and practitioners clam (i know, wrong bivalve :-)
   up. 

> > >      From: Roman Vasicek <[EMAIL PROTECTED]>
> > >      Date: Mon, 22 Apr 2002 12:33:46 +0200 (CEST)
> > >
> > > Hi,
> > >  I want to use perlcc to create standalone executable from my program but
> > > it allways fails with message like this
> > >
> > > /usr/bin/perlcc: test-prd.pl did not compile, which can't happen:
> > > Starting compile
> > >  Walking tree
> > >  Prescan
> > >  Saving methods
> > >  Use of uninitialized value in length at
> > > /usr/lib/perl5/5.6.1/i586-linux/B/C.pm
> > > line 544.
> > >  Use of uninitialized value in length at
> > > /usr/lib/perl5/5.6.1/i586-linux/B/C.pm
> > > line 438.
> > >  Can't locate object method "save" via package "UWVS" (perhaps you forgot
> > > to load "UWVS"?) at /usr/lib/perl5/5.6.1/i586-linux/B/C.pm line 582.
> > >  CHECK failed--call queue aborted.
> > >
> > > Is anyone here who was successful to compile with perlcc program which
> > > use Parse::RecDescent module? Any hints?
> > >
> > > The shortest code producing error message above is
> > >
> > > #!/usr/bin/perl -w
> > >
> > > use strict;
> > > use Parse::RecDescent;
> > >
> > > exit 0;
> > >
> > >
> > > --
> > >  best regards
> > >   Ing. Roman Vasicek
> > >
> > >  software developer
> > > +----------------------------------------------------------------------------+
> > >  PetaMem s.r.o., Drahobejlova 27/1019, 190 00 Praha 9 - Liben, Czech republic
> > >  http://www.petamem.com/
> >
> 
> --
>  Intel, Corp.
>  5000 W. Chandler Blvd.
>  Chandler, AZ 85226
> 
> --
>  Intel, Corp.
>  5000 W. Chandler Blvd.
>  Chandler, AZ  85226

thanks!
hwn

Re: perlcc

Reply via email to