It seems to me that a perl5 program exists as several things:
 - pure source code (ASCII or Unicode)
 - a stream of tokens from the parser
 - a munged stream of tokens from the parser (e.g., use Foo has
   become  BEGIN { require Foo; Foo->import })
 - an unthreaded and unoptimized optree
 - a threaded optimized optree

Different utilities need access to different representations of a
program:
 - source filters munge the pure source code
 - cpp-like macros would work with token streams
 - pretty printers need unmunged tokens in an unoptimized tree, which
   may well be unfeasible
 - bytecode is saved optimized optree (+stab dumps, interpreter context,
   etc)

Would it make sense for the parsing of a Perl program to be done as:
 - tokenize without rewriting (e.g., use stays as it is)
 - structure without rewriting (e.g., constant subs are unfolded)
 - rewrite for optimizations and actual ops

Then Perl could provide hooks into each stage:
 - source filters take and emit text
 - cpp-like filter takes and emits tokens
 - pretty-printer takes compiled op-tree from a file
 - bytecode dumper gets optimized actual-op tree

Nat


Reply via email to