Re: Syntax explainer, phase 2: planning

Larry Wall Wed, 30 Jan 2008 09:40:13 -0800

On Wed, Jan 30, 2008 at 04:08:04PM +0100, Moritz Lenz wrote:
: About half a year ago I posted my idea of a program that explains Perl 6
: syntax:
: 
: http://www.nntp.perl.org/group/perl.perl6.users/2007/07/msg621.html
: 
: Differing from my first post I know think that the best idea is to
: really parse a Perl 6 program with a fully fledged parser, and emit some
: kind of markup language that contains annotations that explains the
: semantic for each token.
: 
: Now you all know the story: "nothing but perl can parse Perl", and of
: course I'm lazy, so I'd like to reuse an existing parser.
: 
: The most appealing idea so far is to use rakudo's grammar for
: experimenting, and later on STD.pm for the "real thing".
: 
: The simplest option is to use a grammar, and write a different action
: class for it (the one who's methods are executed when {*} action stubs
: are found in the grammar), and instead of returning a syntax tree, I
: just return a data structure that contains the position, a description
: of the token, and the actual text.
: 
: That works fine - until the grammar is changed. So I need to execute
: BEGIN blocks, which implies that I need the "normal" parse tree as well.
: D'oh.


Let me correct an oversimplification here.  Most grammar changes
will *not* be done by BEGIN blocks.  BEGIN blocks (like eval) are a
tool of last resort; they're only there for when it's impossible to
achieve what you want by ordinary means.  Perl 6 is very much about
providing more ordinary means for things that used to have to be done
by BEGIN or eval.

Instead, grammar changes will be done by using a module that derives
a grammar from STD.  The derived grammar will be defined the same way
the original grammar is, so there is no change of the basic underlying
rules here.  If you find a sane way of dealing with STD you should be
able to deal with its derivatives just as easily.  Unlike BEGIN blocks,
grammar warping modules come with names and versions and authorities,
so when you warp your language by calling "use", you are doing so in
a controlled fashion, and your new language can still be deterministic,
and produce a well-behaved AST.

: Do you have any idea how I may circumvent the problem?
: 
: I had some thoughts, but none appear to be a good solution:
:  * build two trees, one normal AST for the BEGIN block evaluation, and
: one parse tree for the markup output.
:  * subclass the normal action class, and annotate the AST with enough
: information, and as a second stop, after all BEGIN block were executed,
: filter out the interesting information.
:  * parse the BEGIN blocks with the normal grammar and action class, and
: the rest with the modified action class that emits the markup.
: 
: Actually I have no idea if any of these could work. Any thoughts?

>From my MAD experiences, I'd say the only guaranteed correct way is to
annotate the existing AST, and to make sure that the standard grammar
mechanism has all the hooks you need to do that.  The big evil in the
Perl 5 parser is that it was continually forgetting things.  It does
this by lying to itself about what it saw.  Or in more moderate terms
"replace this AST with that AST".  So when you talk about trying to
maintain a separate AST, I shudder with horror.  It's impossible.
So never replace.  Always augment and annotate.  It will save your
sanity, and stop the flame wars about forcing people to program in
the One True Language.  Perl 6 is not about that.  It's about being a
metalanguage in which you can express many languages, and doing so in
a sufficiently controlled fashion that we always know what language
any given lexical scope is expressed in.  And if we truly know what
language we're parsing at any moment, we can do everything PPI does
without much extra work, and without enforcing arbitrary linguistic
restrictions.

If the current {*} hack is insufficiently powerful for you to
annotate the AST correctly, then we need to negotiate a better hack.  :)

: A second problem is that the information should be accessible for
: perldoc. Since the documentation synopsis is indefinitely pending, I
: don't really want to rely on perldoc syntax, especially because the data
: has to be accessible from the action class.
: This could be circumvented by another abstraction layer (for example a
: text based DB that contains uniq token names and the description, and
: that DB could be used both by the action class and to emit some perldoc).
: Are there better ideas, perhaps even some that don't introduce more
: layers? ;-)
: 
: Any comments are welcome.

This seems to me to primarily be a naming problem, and the AST gives
you the naming path to get to any particular node.  The main thing
you want is some way of naming the top of the AST from within a CHECK
block (or from anywhere else you need to access the structure of the
program from).  Possibly this is a part of the %=FOO set of variables,
and we have $=AST or some such to go along with the %=POD variables.

Anyway, IDEs, syntax highlighters, and refactoring engines are all
going to want to access the same information, and we intend to make it
possible for them to do that.  That is at the very heart of Perl 6, and
the main reason it's so important for Perl 6 to be parsed in Perl 6.

Larry

Re: Syntax explainer, phase 2: planning

Reply via email to