Re: [Scheme-reports] Fwd: R7RS

John Boyle Sat, 04 May 2013 08:44:32 -0700

I agree with the conclusions of Alaric Snell-Pym, Alex Shinn, Sussman, and
others who prefer the R5RS closure-identity semantics.  I regard this
aspect of the R7RS-small draft as an unfortunate accident.

However, I would point out that an R7RS-small-conforming implementation is
nevertheless free to *provide* R5RS closure-identity semantics, and to
advertise that it does so (perhaps with a feature identifier).
 Furthermore, WG2 could require the R5RS semantics in R7RS-large--it would
be *true* that every program that conformed to the R7RS-small
specifications (and relied only on guaranteed features) would also conform
to the R7RS-large specifications.  And I suspect that there will be a
standard "r7rs-large" feature identifier, which would then suffice to
establish "yes, closure identity works" even in the absence of a
standardized name for the closure-identity feature itself.

Might this be an acceptable hack?

Meanwhile, I think I have something to add to Alex Shinn's description of
the issues in question.  I will go into more detail, draw comparisons, make
some arguments, suggest alternate approaches, and conclude that the R6RS
semantics are less helpful than one might have thought.  As a disclaimer,
my knowledge about compilers is that of an amateur and student, and I have
not written one myself.

Let's look at optimizing Will Clinger's example.  At some point during
compilation, there is usually a phase where the magic construct of lambda
gets turned into the allocation of a mundane data structure--a closure,
which is usually a record containing a machine code pointer and the
variables that the closure saves.  Thus, the definition of g will look
something like this:

  (define (g i)
    (let ((h (make-closure ...)))
      (if (= i 0)
          (if (= i m)
              (list h)
              #f)
          (h (- i 1)))))

At this point[1], we may beta-expand the call to h, as many have said.
 (According to Andrew Appel, it is beta-contraction when you substitute the
body of a non-escaping procedure in its only calling place, which is
guaranteed to reduce code size since the original copy of the procedure may
be deleted.  If there are multiple calling places or if the procedure
escapes, then code size may end up increasing--hence beta-expansion, which
can lead to infinite unrolling of loops if you're not careful--compilers
must use conservative heuristics to decide whether to beta-expand.)  If we
do so, the result will look like this:

  (define (g i)
    (let ((h (make-closure ...)))
      (if (= i 0)
          (if (= i m)
              (list h)
              #f)
          (let ((j (- i 1)))
            (if (= j 0)
                #t
                (g (- j 1)))))))

And now we observe that h is only used in the then-branch of the if.  We
apply this principle: If "constructor" is a nice, safe procedure that
doesn't cause side effects (including raising exceptions), and if "var" is
not used (or modified) in the <test> or <else> expressions, and if <test>
doesn't modify any of "x y ..." or leak continuations[2], then it is
permissible to shift the binding around as shown below.  Note that this
applies to any constructor, be it "cons", "make-closure", "vector", and so
on:

  (let ((var (constructor x y ...)))
    (if <test>
        <then>
        <else>))
  ; ==>
  (if <test>
      (let ((var (constructor x y ...)))
        <then>)
      <else>)

In this case, a compiler should know that its = function doesn't modify
anything, and it can hoist the binding of h into the then-clause.  Then an
h-closure will only be created once out of the n times the g function is
called, which will make Will Clinger happy. :-) Incidentally, we can hoist
the h-binding into the then-branch of the second "if", making the closure
created only when it is immediately going to be used:

  (if (= i 0)
      (if (= i m)
          (let ((h (make-closure ...)))
            (list h))
          #f)
      (let ((j (- i 1)))
        (if (= j 0)
            #t
            (g (- j 1)))))

It remains to be seen whether existing compilers can do this.[3]  I think I
have shown it to be relatively straightforward (compared to what else one
might expect of a compiler that aspires to give good performance), although
here my lack of firsthand knowledge may demonstrate itself.

Now.  As Alex Shinn says, this is not the strongest example in favor of the
R6RS semantics.  I will now address his example.  After partial closure
conversion and after beta-expansion of the call to h, we get this:

  (define (g i)
    (let ((h (make-closure ...)))
      (when (= i n)
        (debug h))
      (if (= i 0)
          (if (= i m)
              (list h)
              #f)
          (let ((j (- i 1)))
            (if (= j 0)
                #t
                (g (- j 1)))))))

We see that there are two places where h is referenced, each within an
unlikely branch of an if.  As Alex Shinn says, either of them may cause h
to "escape" and get stored somewhere, for arbitrary use by the rest of the
program.  If we are to maintain object identity, these had better both
refer to the same h.

Our devil's advocate says that if we can break object identity, then it
might be nice to make the "h" in "(debug h)" refer to a different copy of
h, achieved by duplicating the constructor expression, like this:

  (define (g i)
    (let ((h (make-closure ...)))
      (when (= i n)
        (let ((h2 (make-closure ...)))
          (debug h2)))
      (if (= i 0)
          (if (= i m)
              (list h)
              #f)
          (let ((j (- i 1)))
            (if (= j 0)
                #t
                (g (- j 1)))))))

In that case, h is left being used in only one place, and we will be able
to hoist its binding all the way down into the "(list h)" expression.  Then
both "make-closure" expressions will reside in if-branches that are rarely
executed, and the loop will almost never have to construct a closure.  Our
devil's advocate rests his case.

Well.  First, I would argue (as others have) that one can make exactly the
same argument about *any* data structure.  Replace "make-closure" with
"cons", "vector", "string-append", "reverse", or many other things.  It is
quite true that if you didn't have to maintain object identity for X other
data structure, then you could make this splitting optimization and your
code would run faster.  One might argue that closures are special in one
way or another, but I would be unimpressed.

Second, there are, in fact, general ways to avoid unnecessary allocation of
these data structures [and note that this technique applies to all data
structures, not just closures--I'm a big fan of uniformity].  Consider this
scheme: we introduce another variable, "h-allocated", and initially bind it
and "h" to #f.  Then we ensure that all code-paths that lead to "h" include
this expression:

  (unless h-allocated
    (set! h (make-closure ...))
    (set! h-allocated #t))

As a matter of fact, if we know that h should only ever be bound to a
closure, then we can make the variable h do double duty.  (Even in the
general case, where h could be bound to any runtime value, we can create a
special value like a gensym that will only exist if h has not been
initialized yet.  Note that the compiler can be quite sure that this
special value will not escape.)  Thus:

  (define (g i)
    (let ((h #<uninitialized>))
      (when (= i n)
        (when (eq? h #<uninitialized>)
          (set! h (make-closure ...)))
        (debug h))
      (if (= i 0)
          (if (= i m)
              (begin (when (eq? h #<uninitialized>)
                       (set! h (make-closure ...)))
                     (list h))
              #f)
          (let ((j (- i 1)))
            (if (= j 0)
                #t
                (g (- j 1)))))))

This slightly increases code size, and does still require installing the
#<uninitialized> value in the variable "h" (which might occupy a register
or a stack slot) on each iteration of the loop.  I suspect this is exactly
what Sussman alluded to.  This is essentially having "make-closure" or
whatever other constructor be lazy.  [If you're worried about continuations
and threads, use CMPXCHG to install the closure into h, and if someone
beats you to the punch, use their value.]

Making all allocations lazy would be a pessimization, of course--laziness
has overhead and is only useful if the object is often not needed.  A
compiler would have to be taught how to recognize cases where it made sense
to introduce the laziness--e.g. if the main path of a loop didn't use the
structure, and the only paths that did use it already used other slow
operations, then one could conclude that adding the laziness would
introduce a small slowdown in the worst case and a large speedup in the
optimistic case.  A runtime that collected information on how often each
path was taken would, of course, be well equipped to make good decisions.

But, of course, in the best case this is still a bit slower than if you
just broke object identity and had separate allocs.  The R6RS semantics
give the compiler strictly more power.  I argue here that the extra power
it gets, compared to the cost imposed on the programmer's brain and our
aesthetics, is not significant--for example, contra what I get from some
previous rationales, it is *not* the only seemingly-practical way to turn a
loop that always allocates memory into a loop that almost never allocates
memory.  (If someone feels like testing Larceny again, how does my
"#<uninitialized>" version compare to the default version and to the
R6RS-modified version of Alex Shinn's example?  Use #f for
#<uninitialized>.  The performance will probably be somewhat different than
if the tweaking was done as an integral part of the compiler, but oh well.)

-- John Boyle

----

[1] The erudite reader may point out that (a) after closure-conversion,
closures are usually passed their envs or even themselves as an extra
argument, (b) often variables are renamed to avoid conflicts before any of
this inlining is done, (c) often the code is converted to
continuation-passing style or some similar form before any of this, and so
on.  Luckily, these issues need not be paid attention in this example.

[2] At least, I assume it is a problem if your code binds x to a new cons
cell and then saves a continuation, but later calls to the continuation
bind x to new cons cells that aren't all identical to the first one.  The
Racket compiler disagrees with me on this. http://pastebin.com/7MmkGA1n  I
suspect this is a bug.

[3] Racket, for one, appears able to allocate 0 bytes per call when you
pass an even argument to the following function [it allocates 32 bytes, its
size of cons cell on a 64-bit machine, when passed odd arguments]:

  (define (meh x)
    (let ((y (cons x 3)))
      (if (even? x)
          x
          y)))

This works also for binding y to "(vector x 10)".  Curiously, if you
instead bind y to "(lambda (u) (+ x u))", it allocates 32 bytes per call on
both even and odd arguments.  It would appear that the Racket compiler
thinks of closures as being different kinds of objects than conses and
vectors; methinks the Racket devs will have to duplicate some functionality
in their compiler to optimize that case.  (I don't particularly intend to
criticize Racket, by the way; I know more criticizable things about it than
about other implementations because I use it the most.)

On Wed, May 1, 2013 at 3:18 AM, Alaric Snell-Pym <[email protected]>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> > However, maintaining logical consistency is an important
> > goal in language design.  The main goal of specifying a
> > language is not to make the job of the compiler writer
> > easy.  If that was the goal we could adopt "C".  We
> > should make a language we like.  The compiler writers I
> > know are surely good enough to handle small complexities
> > like this.
>
>
> I must admit, I've come to agree, despite having voted for the r6rs
> approach in http://trac.sacrideo.us/wg/wiki/WG1Ballot3Results; I think,
> at the time, I was distracted about defining equivalence of procedures
> in the full that-would-mean-solving-the-halting-problem sense, but we
> can have a workable definition of eqv? based on the point in time of
> closure creation - and I certainly don't see how compilers can have any
> trouble reconciling that with inlining.
>
> One for R8RS, eh? :-)
>
> ABS
>
> - --
> Alaric Snell-Pym
> http://www.snell-pym.org.uk/alaric/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQIcBAEBAgAGBQJRgOvsAAoJENVnbn/DjbpJq6EP/0mbZAbMiKhkLh0KB3seMEK8
> +xffdpPK+bUhfJitDLRRD7zNnkblUjxr60hSQuBx2fcD4+1sFGlGsnTPc7sHHXx9
> Zbsh00+LPRmR6HLk5Mu5E2zOtZYy1CqLThqw/Dz2Qj4y5wN6qCJVi7sKuqp6Cyd2
> 4O6rc2HHsKrcJ70ysWWdLYGU3Sxpaah4eYkD3P5H4CVBIV+ojJfiOOm0hMHDFuGR
> MiNK04mFWKB6Rgwy7/MfeMP9K9BVlOYTpatrI5cGXK1WgiQf3gXmzTAlQi7BCF0W
> TciPEFUKbhCrkuPJDwA5wD78lG0rOl4abRAZf5TSnWyJQ8S7Ip2LWEOnYsPCGlK6
> wlXnzTPVVpM+lmFjAEU1Ol8HMTaDDkfYalbVGKe2D6n4Dt6MQdYT0lijfO8UkUx/
> uuQWDaqtQMBxiDzDl3PoSXoWMUUX8fI/NE+lFOd0ngaBqRva1BFhZj2OTybF6iox
> uf8GmurpXCitN2mdyl0pG6+f8NUMaeEjkDFzjXH6v1VOWMXnYSYvcTkilkF8Pm0X
> orjXq5N9emYklJXdmQxjwdPifA2jJilAr7wwye1WkrDcbOU/VSeKGc9DngQKxrzR
> VtNv27T78EZowPRjrbdxGfJHoaRWWYeDzr9rHXWpOyBZaZpsgRhYHy4rMTSzCnrp
> +oLvE4CAgSLyso1Fmgx6
> =BEW4
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Scheme-reports mailing list
> [email protected]
> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
>

_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Re: [Scheme-reports] Fwd: R7RS

Reply via email to