[bitc-dev] DISCUSS: syntactic design for macros

Jonathan S. Shapiro Fri, 18 Jul 2008 07:51:13 -0700

I have been using terms like "well formed program" and "syntactic
program", but I haven't defined either term. In the interests of helping
the macro system discussion along, let me try to define them.


Let me also state that my views here should be treated with skepticism,
because I may have this entirely wrong. Also, let me say that I am
focused in this discussion *entirely* on concerns of syntactic design. I
am intentionally setting aside the question of macro execution
environments.

Since everybody seems to think they know how things work in LISP and
Scheme, let me start there: LISP and Scheme are NOT S-expression
languages. More precisely: what we *mean* when we say that they are is
not what most people assume. What LISP and Scheme really are is
languages that are simultaneously defined at two grammatical levels:

  S-expressions:  All valid LISP and Scheme programs consist of some
                  combination of parentheses and atoms, subject to the
                  constraint that the parentheses must balance.

                  Note that this is already not a general S-expression,
                  because the atoms are required to be valid LISP/Scheme
                  atoms. That is: the tokenizer has already run.

  LISP/Scheme:    All valid LISP and Scheme programs consist of some
                  valid arrangement of syntactic forms and their
                  syntactically required "arguments".

That is: there is a representation pun here. That is what a macro system
exploits. In some sense, the "main" parser is the one that builds
S-expressions, while the LISP/Scheme parser is merely a tree parser that
checks a tree for higher-level validity.

Note that macro-expansion relies in part on the syntactic level of
understanding. A macro label is not applied unless it appears as the CAR
of some S-expression. Finally, note that the behavior of a macro
(splicing/replacing) relies explicitly on the fact that it operates
within an S-expression environment.


RAISING THE LEVEL

Okay. Now let's start raising the level by small increments. Let's start
with infix/prefix/postfix (generally: mixfix) issues.

** Mixfix

In any system supporting mixfix operators, the operators themselves are
tokens, and there is a table available of the form:

  ( token-string, {mixfix|infix|postfix}, arity, precedence }

Note that the transform from mixfix to S-expressions is a purely
structural tree to tree transform. It can be performed on an
S-expression without knowing anything at all about the syntactic
definition of the language. If we add a mixfix definition mechanism to
BitC, we can *already* support expressions like

   a + b   => (apply + a b)

directly. The tricky bit about this is parenthesization:

  (a + b) => ((apply + a b))

which probably isn't what you had in mind, and there isn't a good
divining rule for fixing this. That is why David borrowed a different
kind of parenthesis.

So here is my point: we could switch BitC completely over to a mixfix
expression syntax without losing our ability to hand S-expressions to
the macro system and/or the parser.

** Conventional Procedure Application Syntax

I'm reasonably confident that we can convert application to
S-expressions as well, though that conversion requires knowing how to
tell the difference between keywords and non-keyword identifiers. It is
still a tree-to-tree transform. We're moving further away from the
notion of a pure S-expression syntax, but we haven't gone over the cliff
quite yet.

** Block-like Control Flow Syntax

In fact, given a list of keywords and their expected "tree arity", we
can render those into S-expressions as well starting from a block-syntax
language. When we get around to doing a block syntax for BitC, that will
effectively be what we are doing anyway.

So you can see that quite a lot is already possible to "clean up" the
syntax for human consumption without eliminating simple macros.


THE HARD STUFF

Ironically, the part of this that is moderately painful isn't the
expression syntax at all. It's the binding constructs. Suppose we have
something like:

  let x = expr in
    expr expr ...
  done

The transform pattern is going to be something like:

  LET defs:<tree>+ IN body:<tree>+ DONE =>
  (let (<expansion of defs>)
    (<expansion of bodys>)

Now remember these are just trees. We can use mixfix and precedence to
turn "tree = tree" into (= tree tree), but notice that we still haven't
quite gotten to a proper let form here.

So hang on just a moment and look at what is really happening here. What
is happening is that one step at a time we are building a grammar and a
parser. It is a very permissive grammar, because we're only trying to
get from input to S-expr rather than determine input validity in
general, but it's still a parser.

AN ALTERNATIVE VIEW OF MACROS

The parser that we are building is a very peculiar beast, *because* it
doesn't seek validity. But note also that this parser is also a
relatively simple tree-rewriting scheme. What it actually does is
recognize (recursively) embedded tree-like patterns in the input and
convert those into some sort of S-expression by rearranging pieces.

And if we are a bit careful, we can generalize this parser in such a way
that new syntactic patterns can be added and new semantic actions
producing ASTs can be injected.

And in my opinion, *that* is how we should think about macro processing
in a block-structured language. We can't do it with a conventional
parser, but a recursive-descent parser will handle this just fine. So my
proposal is that the "parse phase" should proceed in two phases: one
which converts a token sequence into an S-expression, and a second that
performs a tree-parse on the S-expression to determine final validity.



shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

[bitc-dev] DISCUSS: syntactic design for macros

Reply via email to