[racket-users] syntax/parse is not hygienic

Alexis King Sun, 04 Mar 2018 12:41:06 -0800

Apologies in advance for both the inflammatory subject and yet another
overly long email to this list.


I think anyone who knows me knows that I love syntax/parse — I think
it’s far and away one of Racket’s most wonderful features — but I’ve
long suspected it does not respect hygiene. Consider:

    #lang racket
    (require syntax/parse/define)

    (define x #f)

    (begin-for-syntax
      (define-syntax-class a
        [pattern _ #:attr def #'(define x #t)])
      (define-syntax-class b
        [pattern _ #:attr use #'x]))

    (define-simple-macro (m a:a b:b)
      (begin a.def b.use))

    (m 0 0) ; => #t

This program produces #t from the reference to x on line 10. Considering
the natural lexical scope of the program, as it appears to a human
reader, there is no local definition of x in scope where #'x is written
on line 10, so it logically ought to refer to the top-level definition
of x on line 4, which would make the program produce #f. However, it
does not. Instead, it produces #t because it actually refers to the
definition of x written on line 8, which is assembled alongside the use
on line 13.

While this behavior makes sense from the perspective of someone familiar
with the semantics of procedural macros, if taken from the point of view
of pattern-based systems, it seems to violate one of the essential
properties of a hygienic macro system. Namely, the macro system should
respect program scope. The above program does not.

For those unfamiliar with the details, this behavior occurs because
syntax class uses are not treated like macro transformations. When a
macro is expanded, a fresh scope is attached to its expansion, but when
a syntax class is used, its syntax objects have no additional
introduction scope. One could argue this behavior is useful — sometimes
it is helpful to be able to assemble larger pieces of syntax from the
outputs of different syntax classes without needing to pass shared
identifiers as input to the classes — but it also causes problems. I
think I first ran into it when I was using a syntax class to generate a
series of definitions:

    #lang racket
    (require syntax/parse/define)

    (begin-for-syntax
      (define-syntax-class def-and-use
        [pattern val:expr
                 #:attr x #'(begin
                              (define tmp (+ val 1))
                              (displayln tmp))]))

    (define-simple-macro (m a:def-and-use ...)
      (begin a.x ...))

    (m 1 2 3)

I would expect this program to print "2\n3\n4\n", but instead, it fails
to compile with an error:

    module: identifier already defined
      in: tmp

The multiple definitions of tmp are assembled alongside each other, and
since they all have the same scopes, they collide. A solution is to use
generate-temporary, but that is a little ugly. A solution that uses a
helper macro in place of the syntax class has no such problem:

    #lang racket
    (require syntax/parse/define)

    (define-simple-macro (def-and-use val:expr)
      (begin (define tmp (+ val 1))
             (displayln tmp)))

    (define-simple-macro (m a:expr ...)
      (begin (def-and-use a) ...))

    (m 1 2 3)

There are arguments to be made that the existing behavior is not
unreasonable. Syntax classes behave like phase 1 functions, not macros.
If one desires macro-like behavior, it’s often possible to use a helper
macro instead of a syntax class. This is not always true, however;
sometimes syntax classes are used to generate syntax that will be
inserted into places where the macroexpander will not run (such as
binding positions), but one still needs to use generate-temporaries to
avoid duplicate bindings.

There are some minor questions as to what the semantics of “hygienic”
syntax classes would be, since they accept arbitrary values as inputs
(in the case of parameterized syntax classes), not exclusively syntax
objects. They also have multiple outputs, some of which may not be
syntax-valued, so it’s not immediately obvious to me if performing the
same scope flipping that works for macros would produce the appropriate
result for syntax classes.

Still, with all this context out of the way, my questions are
comparatively short:

  1. Is this lack of hygiene well-known? I did not find anything in
     Ryan’s dissertation that explicitly dealt with the question, but I
     did not look very hard, and even if it isn’t explicitly mentioned
     there, I imagine people have thought about it before.

  2. Are there some fundamental, theoretical obstacles to making a
     syntax class-like thing hygienic that I have not foreseen? Or would
     it really be as simple as performing the usual scope-flipping that
     macroexpansion already performs?

  3. If it is possible, is the unhygienic nature of syntax classes
     desirable frequently enough that it outweighs the benefits of
     respecting hygiene? That seems unlikely to me, but maybe I have not
     fully considered the problem. The semantics of syntax classes
     cannot be changed now, of course, for backwards compatibility
     reasons, but were that not a problem, would it make sense to make
     them hygienic?  If not, why not?

Many thanks to all who managed to get to the end of this email,
Alexis

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] syntax/parse is not hygienic

Reply via email to