It seems to me that a perl5 program exists as several things:
- pure source code (ASCII or Unicode)
- a stream of tokens from the parser
- a munged stream of tokens from the parser (e.g., use Foo has
become BEGIN { require Foo; Foo->import })
- an unthreaded and unoptimized optree
- a threaded optimized optree
Different utilities need access to different representations of a
program:
- source filters munge the pure source code
- cpp-like macros would work with token streams
- pretty printers need unmunged tokens in an unoptimized tree, which
may well be unfeasible
- bytecode is saved optimized optree (+stab dumps, interpreter context,
etc)
Would it make sense for the parsing of a Perl program to be done as:
- tokenize without rewriting (e.g., use stays as it is)
- structure without rewriting (e.g., constant subs are unfolded)
- rewrite for optimizations and actual ops
Then Perl could provide hooks into each stage:
- source filters take and emit text
- cpp-like filter takes and emits tokens
- pretty-printer takes compiled op-tree from a file
- bytecode dumper gets optimized actual-op tree
Nat