[PATCH] Introduce "closure sharing" optimization

Peter Bex via Chicken-hackers Wed, 23 Jul 2025 07:35:28 -0700

[note: this is a long mail, apologies for that]

Hi all,


The attached patches introduce a new optimization.  It can also be found
in the "shared-closures" branch in git.  We ran Salmonella on it the
past few days, and it seems to work perfectly:
http://salmonella-linux-x86-64.call-cc.org/shared-closures/gcc/linux/x86-64/2025/07/

The optimization should improve performance by not requiring all
closed-over variables to be copied every time we create a new closure.

Instead, we simply take the surrounding closure (the "container")
and wrap it in the closure (the "user") so that the user can access
the closed-over variables from the container.  Any closure created
inside the user may then also receive the container directly,
transitively down the "chain" of nested closures.

If a "user" introduces new variables which nested closures can see, we
simply mutate the container closure in place to hold that variable, for
which we reserve a spot when initially creating the container.  Added
advantage: if the variable *only* exists inside a shared chain, we can
avoid boxing it because the container already acts as a box.

We can obviously only do this under restricted situations, because
closed-over variables may get different values on different invocations
of the same lambda, which means they'd get clobbered and any closures
that need to reference to original value would see the updated value
instead.  It's the "broken JavaScript lambda-creating for loop" problem
all over again.

Another complicating factor is that we don't want out-of-scope values to
be held onto by inner lambdas, because that would extend their lifetime
beyond what is necessary.  One of the great aspects of CHICKEN is that
it has very strict variable lifetimes due to its flat closure
representation.  We don't want to throw that away!

The current implementation is extremely (probably overly) cautious and
it only allows closure sharing among lambdas where:

- None of the lambdas in a "shared closure chain" escape,
- each lambda in the chain creates at most one other lambda,
- each lambda closes over a superset of the defining lambda's closed-over
  variables (so the closures only "grow" as you go down the nesting)
- and the lambda may not be a foreign callback

My gut feeling says we can probably relax a few of these restrictions,
but it's kind of hard to oversee exactly how features interact.
For example, first I thought containers might escape (which allows for a
great many more opportunities for this optimization), but it appears
that when a lambda is assigned to a global and it invokes itself
recursively, the number of call sites isn't counted correctly towards
that internal lambda (possibly because the global may be set! to
something else?), which means it's basically impossible to detect
self-calls (which may not be shared, as that invalidates the rule that
variables may be assigned only once).  Threading might also be a problem
since a continuation could be suspended and then the same closure
invoked elsewhere which would clobber shared variables in the suspended
closure.

So for now it's best to only allow "internal" procedures and those which
we know do not escape.  Unfortunately, that drastically reduces its
impact.

But the upshot of all of this is that the problem in #1852 is now
thoroughly avoided because the chain of closures created as part of
evaluating the arguments to a call fits the above criteria exactly.
The nice thing is that it is a general solution that applies to any
program, without affecting semantics.

The benchmark suite seems mostly unaffected (see the attached file
benchmark-comparison.txt), but it improves a few programs' performance.
Notably kernwyk-cat.  It's a cool example because it's a simple
program which illustrates the concept well:

(define (catport in out)
  (let ((x (read-char in)))
    (if (not (eof-object? x))
        (begin
         (write-char x out)   ;; closure container
         (catport in out))))) ;; closure user

During CPS conversion, the write-char call gets wrapped in a lambda which
closes over "in" and "out", and the recursive call to catport also
gets wrapped in a lambda which closes over those same variables, so the
optimization kicks in and the former is assigned to be a container and
the latter a user, so we can create a closure with only one variable
slot instead of two.  It doesn't seem like much, but this means the
stack consumption is reduced in this tight loop so we end up doing less
GCs.

As for the example in #1852, the original implementation in master
took 54 minutes to translate to C, which resulted in a 329MB file.
Then, gcc took 284 minutes to compile this, resulting in a 181MB binary.
The runtime of the binary is 0.064s real and 0.033s user.

With the implementation in this branch, it takes 1.5 minute to translate
to C, resulting in a 5.1MB file.  Gcc takes 14 seconds to compile,
resulting in a 5.4MB binary.  Runtime of the binary is half that of the
original: 0.031s real and 0.14s user.

Note that most of the translation to C time seems to be eaten up by the
quadratic complexity of lset-difference/eq? and remove-duplicates.
I've added new procedures which use a hash table to avoid traversing
the same list over and over.

Note: the first patch is actually unrelated, but I noticed this while I
was looking at the analysis code.  The second patch is just to keep the
third (main) one more readable.

Cheers,
Peter

>From a34e7e4d92e02e534f426b416772248d7210baa7 Mon Sep 17 00:00:00 2001
From: Peter Bex <[email protected]>
Date: Thu, 10 Jul 2025 10:03:44 +0200
Subject: [PATCH 1/5] Remove bogus safety measure

The measure used to read (unless (eq? val n) (bomb ...)), which meant
that we should not allow captured values to be rest ops.

However, with 32067934f we now do allow rest ops to refer to captured
variables.  That commit also changed the safety check, turning it into
a tautology - the val will always either be direct, or a reference to
a closure var.  So it makes no sense to even keep the check.
---
 core.scm | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/core.scm b/core.scm
index 0459a214..2ff2396f 100644
--- a/core.scm
+++ b/core.scm
@@ -2625,10 +2625,6 @@
          ((##core#rest-cdr ##core#rest-car ##core#rest-null? 
##core#rest-length)
           (let* ((val (ref-var n here closure))
                  (rest-var (if (eq? val n) (varnode (first params)) val)))
-            (unless (or (eq? val n)
-                        (match-node val `(##core#ref (i) (##core#variable 
(,here))) '(i)))
-              ;; If it's captured, replacement in optimizer was incorrect
-              (bomb "Saw rest op for captured variable.  This should not 
happen!" class) )
             ;; If rest-cdrs have not all been eliminated, restore
             ;; them as regular cdr calls on the rest list variable.
             ;; This can be improved, as it can actually introduce
-- 
2.49.0

>From e4c49ac29c35d4c013ca3a1d72118ecb58daff13 Mon Sep 17 00:00:00 2001
From: Peter Bex <[email protected]>
Date: Fri, 18 Jul 2025 15:44:50 +0200
Subject: [PATCH 2/5] Naming change in closure "transform" procedure

Change the name of the second argument in the "transform" procedure
from "here" to "crefvar".

The name "here" was a bit misleading, as "here" meant the lambda id in
both "gather" and "walk-global" in "prepare-for-code-generation",
while it referred to the closure variable ("cvar") in "transform".
This inconsistency was a bit confusing when initially trying to
understand the code.

Besides being a bit of a nitpicky improvement, this will pave the way
for the next commit, which will differentiate between the actual
closure variable (i.e., "cvar") and the variable used to access
variables inside the closure (i.e., "crefvar").

But for the current commit, the value of "crefvar" is always the same
as "cvar" which is generated when converting the lambda to a closure
literal.
---
 core.scm | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/core.scm b/core.scm
index 2ff2396f..9e041a38 100644
--- a/core.scm
+++ b/core.scm
@@ -2605,7 +2605,7 @@
          (else (concatenate (map (lambda (n) (gather n here locals)) subs)) ) 
) ))
 
     ;; Create explicit closures:
-    (define (transform n here closure)
+    (define (transform n crefvar closure)
       (let ((subs (node-subexpressions n))
            (params (node-parameters n))
            (class (node-class n)) )
@@ -2617,25 +2617,25 @@
 
          ((##core#variable)
           (let* ((var (first params))
-                 (val (ref-var n here closure)) )
+                 (val (ref-var n crefvar closure)) )
             (if (test var 'boxed)
                 (make-node '##core#unbox '() (list val))
                 val) ) )
 
          ((##core#rest-cdr ##core#rest-car ##core#rest-null? 
##core#rest-length)
-          (let* ((val (ref-var n here closure))
+          (let* ((val (ref-var n crefvar closure))
                  (rest-var (if (eq? val n) (varnode (first params)) val)))
             ;; If rest-cdrs have not all been eliminated, restore
             ;; them as regular cdr calls on the rest list variable.
             ;; This can be improved, as it can actually introduce
             ;; many more cdr calls than necessary.
             (cond ((eq? class '##core#rest-cdr)
-                   (transform (replace-rest-op-with-list-ops class rest-var 
params) here closure))
+                   (transform (replace-rest-op-with-list-ops class rest-var 
params) crefvar closure))
 
                   ;; If n isn't val, this node was processed and the
                   ;; variable got replaced by a closure access.
                   ((not (eq? val n))
-                   (transform (replace-rest-op-with-list-ops class rest-var 
params) here closure))
+                   (transform (replace-rest-op-with-list-ops class rest-var 
params) crefvar closure))
 
                   (else val)) ) )
 
@@ -2645,7 +2645,7 @@
                ##core#let_float ##core#box_float ##core#unbox_float
               ##core#inline_loc_ref
               ##core#inline_loc_update)
-          (make-node (node-class n) params (maptransform subs here closure)) )
+          (make-node (node-class n) params (maptransform subs crefvar 
closure)) )
 
          ((let)
           (let* ([var (first params)]
@@ -2654,14 +2654,14 @@
             (if boxedvar
                 (make-node
                  'let (list boxedalias)
-                 (list (transform (first subs) here closure)
+                 (list (transform (first subs) crefvar closure)
                        (make-node
                         'let (list var)
                         (list (make-node '##core#box '() (list (varnode 
boxedalias)))
-                              (transform (second subs) here closure) ) ) ) )
+                              (transform (second subs) crefvar closure) ) ) ) )
                 (make-node
                  'let params
-                 (maptransform subs here closure) ) ) ) )
+                 (maptransform subs crefvar closure) ) ) ) )
 
          ((##core#lambda ##core#direct_lambda)
           (let ((llist (third params)))
@@ -2671,7 +2671,8 @@
                (let* ((boxedvars (filter (lambda (v) (test v 'boxed)) vars))
                       (boxedaliases (map cons boxedvars (map gensym 
boxedvars)))
                       (cvar (gensym 'c))
-                      (id (if here (first params) 'toplevel))
+                      (id (if crefvar (first params) 'toplevel))
+                       (new-crefvar cvar)
                       (capturedvars (or (test id 'captured-variables) '()))
                       (csize (or (test id 'closure-size) 0))
                       (info (and emit-closure-info (second params) (pair? 
llist))) )
@@ -2700,7 +2701,7 @@
                             (cond ((and rest (assq rest boxedaliases)) => cdr)
                                   (else rest) ) ) )
                           (fourth params) )
-                    (list (let ((body (transform (car subs) cvar 
capturedvars)))
+                    (list (let ((body (transform (car subs) new-crefvar 
capturedvars)))
                             (if (pair? boxedvars)
                                 (let loop ((aliases (unzip1 boxedaliases))
                                            (values
@@ -2713,7 +2714,7 @@
                                                  (list (car values)
                                                        (loop (cdr aliases) 
(cdr values))))))
                                 body) ) ) )
-                   (let ((cvars (map (lambda (v) (ref-var (varnode v) here 
closure))
+                   (let ((cvars (map (lambda (v) (ref-var (varnode v) crefvar 
closure))
                                      capturedvars) ) )
                      (if info
                          (append
@@ -2739,23 +2740,23 @@
                             (make-node
                              (if immf '##core#updatebox_i '##core#updatebox)
                              '()
-                             (list (make-node '##core#ref (list (add1 i)) 
(list (varnode here)))
-                                   (transform val here closure) ) )
+                             (list (make-node '##core#ref (list (add1 i)) 
(list (varnode crefvar)))
+                                   (transform val crefvar closure) ) )
                             ;; Is the following actually used???
                             (make-node
                              (if immf '##core#update_i '##core#update)
                              (list (add1 i))
-                             (list (varnode here)
-                                   (transform val here closure) ) ) ) ) )
+                             (list (varnode crefvar)
+                                   (transform val crefvar closure) ) ) ) ) )
                   ((test var 'boxed)
                    (make-node
                     (if immf '##core#updatebox_i '##core#updatebox)
                     '()
                     (list (varnode var)
-                          (transform val here closure) ) ) )
+                          (transform val crefvar closure) ) ) )
                   (else (make-node
                          'set! (list var immf)
-                         (list (transform val here closure) ) ) ) ) ) )
+                         (list (transform val crefvar closure) ) ) ) ) ) )
 
          ((##core#primitive)
           (make-node
@@ -2769,15 +2770,15 @@
 
          (else (bomb "bad node (closure2)")) ) ) )
 
-    (define (maptransform xs here closure)
-      (map (lambda (x) (transform x here closure)) xs) )
+    (define (maptransform xs crefvar closure)
+      (map (lambda (x) (transform x crefvar closure)) xs) )
 
-    (define (ref-var n here closure)
+    (define (ref-var n crefvar closure)
       (let ((var (first (node-parameters n))))
        (cond ((posq var closure)
               => (lambda (i)
                    (make-node '##core#ref (list (+ i 1))
-                              (list (varnode here)) ) ) )
+                              (list (varnode crefvar)) ) ) )
              (else n) ) ) )
 
     (debugging 'p "closure conversion gathering phase...")
-- 
2.49.0

>From 8764d3903f5930bb8f635367c0e23d3958cb4652 Mon Sep 17 00:00:00 2001
From: Peter Bex <[email protected]>
Date: Fri, 18 Jul 2025 15:11:47 +0200
Subject: [PATCH 3/5] Add support for sharing closures to the compiler

Normally, CHICKEN uses strictly flat closures, which means that each
and every closed-over variable is wrapped in the closure object
together with the translated lambda's C function.  Unfortunately, this
can lead to a whole lot of copying if the lambda calls a lot of CPS
functions.  Each CPS call requires breaking up the lambda into
multiple different lambdas, each of which get their own C function and
closure object.

This means that a big Scheme closure which gets split up into multiple
low-level CPS closures like this will keep copying the same variables
over and over, wrapping them into a new closure, calling the CPS
function, which then "returns" to that closure, which wraps another
closure, calls another CPS function etc etc.  This can result in quite
pathological behaviour, like reported in #1852.  The C compiler also
has trouble dealing with this amount of generated code.

One could use "linked" closures which simply point to their containing
function, but that can be somewhat slow in lookup and will extend
variables' lifetime for longer than necessary, which has its own
problems.  Instead, we now create a big "container" closure when
calling the initial function, with some slots initialized as
undefined.  This container closure is then passed around to other
translated lambdas ("users") which can access the variables from that
container.  When the function which defines the variable runs, it
adds its variable to the closure at the spot reserved for it.
Subsequent "users" can then read it from the container as if they
closed over it.

This means there is no copying going on - the same container is always
passed to the next closure.  The container is the only closed-over
variable inside a sharing "user" closure.  A chain of shared closures
always starts with a "container" closure, which closes over any number
of variables.  Each subsequent user to which the container is passed
must strictly extend the set of closed-over variables.  If any
variable is dropped in a lambda, we disallow that lambda from being a
user.  Instead, it's either a regular closure or a container which
starts a new sharing chain.

We also have to take care that sharing users and containers are never
invoked more than once with the same closure object.  This would lead
to variables being clobbered.  Effectively, this means containers and
users are only created for non-escaping procedures which don't have
any mutual (direct or indirect) recursion going on.

Also, we disallow containers or users from defining more than one
sub-lambda, because that would invalidate the requirement that
closures must always be grown.  Strictly speaking, it may be possible
for a shared closure to be re-used in multiple users, but it's hard to
reason about the situations in which this would be safe and would
complicate the code too much.

Note that if a continuation is reified with call/cc, the argument to
call/cc is considered an escaping function, so it will never be marked
as a user or container.  This means it should be impossible to observe
these container slots changing around by capturing and reinvoking a
continuation (as per letrec).
---
 batch-driver.scm |   9 +-
 core.scm         | 209 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 193 insertions(+), 25 deletions(-)

diff --git a/batch-driver.scm b/batch-driver.scm
index c5012c30..6b0e8fd2 100644
--- a/batch-driver.scm
+++ b/batch-driver.scm
@@ -117,7 +117,8 @@
                 (undefined . und) (replacing . rpg) (unused . uud) 
(extended-binding . xtb)
                 (inline-export . ilx) (hidden-refs . hrf)
                 (value-ref . vvf)
-                (customizable . cst) (has-unused-parameters . hup) (boxed-rest 
. bxr) ) )
+                (customizable . cst) (has-unused-parameters . hup) (boxed-rest 
. bxr)
+                 (shareable-container . shc) (shareable-user . shu) ) )
        (omit #f))
     (lambda (db)
       (unless omit
@@ -141,7 +142,8 @@
                     (case (caar es)
                       ((captured assigned boxed global contractable 
standard-binding assigned-locally
                                  collapsable removable undefined replacing 
unused simple inlinable inline-export
-                                 has-unused-parameters extended-binding 
customizable constant boxed-rest hidden-refs)
+                                 has-unused-parameters extended-binding 
customizable constant boxed-rest hidden-refs
+                                  shareable-container shareable-user)
                        (printf "\t~a" (cdr (assq (caar es) names))) )
                       ((unknown)
                        (set! val 'unknown) )
@@ -152,7 +154,8 @@
                       ((potential-values)
                        (set! pvals (cdar es)))
                       ((replacable home contains contained-in use-expr 
closure-size rest-parameter
-                                   captured-variables explicit-rest rest-cdr 
rest-null? consed-rest-arg)
+                                   captured-variables explicit-rest rest-cdr 
rest-null? consed-rest-arg
+                                    shared-closure sharing-mode)
                        (printf "\t~a=~s" (caar es) (cdar es)) )
                       ((derived-rest-vars)
                        (set! derived-rvars (cdar es)))
diff --git a/core.scm b/core.scm
index 9e041a38..6746a06a 100644
--- a/core.scm
+++ b/core.scm
@@ -285,6 +285,10 @@
 ;   explicit-rest -> <boolean>               If true: procedure is called with 
consed rest list
 ;   captured-variables -> (<var> ...)        List of closed over variables
 ;   inline-target -> <boolean>               If true: was target of an 
inlining operation
+;   shareable-container -> <boolean>         If true: potentially may collect 
and share closed-over variables from (nested) contained closures
+;   shareable-user -> <boolean>              If true: closed-over variables 
may potentially be shared from the containing closure
+;   sharing-mode -> <container|user>         If container: actually collects 
and shares closed-over variables from (nested) contained closures. If user: 
receives container closure
+;   shared-closure -> (<var> ...)            List of transitively closed over 
variables of the sharing-container and its sharing-user
 
 
 (declare
@@ -2422,6 +2426,30 @@
                             (db-put! db (first lparams) 'explicit-rest #t)
                             (db-put! db rest 'consed-rest-arg #t) ) ) ) ) ) ) 
) ) )
 
+         ;; If it has a known or local value which is a procedure, and 
referenced only once
+         ;; and only one call site or is an internal procedure, mark it as 
'shareable-user so that
+         ;; its closed over variables may be shared with its containing 
procedure.
+         ;; Note that callbacks are exempt from this, because callback_wrapper 
creates an empty closure
+         ;; manually, throwing away our carefully crafted closure. TODO: can 
maybe be done better?
+         ;;
+         ;; If furthermore it only contains a single other procedure, mark it 
as 'shareable-container
+         ;; so that may share closed-over variables with that one procedure.
+         (and-let* ((val (or local-value value))
+                    ((eq? '##core#lambda (node-class val)))
+                    (lparams (node-parameters val))
+                    ((or (= 1 nreferences ncall-sites)
+                         (not (second lparams))))
+                   ((not (rassoc sym callback-names eq?))))
+          (db-put! db (first lparams) 'shareable-user #t)
+           (and-let* ((id (first lparams))
+                      (contains (or (db-get db id 'contains) '()))
+                      ((= (length contains) 1)))
+             ;; TODO: It should be possible to have escaping / global 
procedures be containers, but
+             ;; they should not call themselves because then they might be 
setting variables in
+             ;; the closure to different values at different times.  So for 
now we're extra careful
+             ;; about which are containers.
+             (db-put! db (first lparams) 'shareable-container #t)))
+
         ;; Make 'removable, if it has no references and is not assigned to, 
and one of the following:
         ;; - it has either a value that does not cause any side-effects
         ;; - it is 'undefined
@@ -2501,6 +2529,8 @@
 
 (define (perform-closure-conversion node db)
   (let ((direct-calls 0)
+        (sharing-containers 0)
+        (sharing-users 0)
        (customizable '())
        (lexicals '()))
 
@@ -2604,6 +2634,68 @@
 
          (else (concatenate (map (lambda (n) (gather n here locals)) subs)) ) 
) ))
 
+
+    ;; Merge shareable closures.  This allocates space for closed-over
+    ;; variables of the longest unbroken line of sharing-users in the
+    ;; sharing-container, mutating the database entries set up by
+    ;; "gather" to account for this.
+    (define (merge-shareable n shared-closure)
+      (let ((subs (node-subexpressions n))
+           (params (node-parameters n)) )
+       (case (node-class n)
+
+         ((quote ##core#undefined ##core#provide ##core#proc ##core#primitive)
+          '())
+
+         ((##core#lambda ##core#direct_lambda)
+          (##sys#decompose-lambda-list
+           (third params)
+           (lambda (vars argc rest)
+             (let* ((id (first params))
+                     (this-closure (test id 'captured-variables)))
+                ;; TODO: unbox vars that are only referenced inside the shared 
closure
+                (cond ((and shared-closure
+                            (test id 'shareable-user)
+                            ;; The user must close over all the shared closure 
vars, otherwise
+                            ;; we risk extending the lifetime of these vars 
for too long.
+                            (null? (lset-difference/eq? shared-closure 
this-closure))
+                            ;; Minimum shared closure size - don't want to 
share a single var, it's extra indirection
+                            (> (length this-closure) 1))
+                       ;; We only pass on the container to the subs if this is 
also a shareable-container
+                       (let ((sub-closure (merge-shareable (first subs) (and 
(test id 'shareable-container) this-closure))))
+                         ;; Reset captured vars.  This closure only captures 
the container
+                         (db-put! db id 'closure-size 1)
+                         (db-put! db id 'captured-variables '())
+                         (db-put! db id 'sharing-mode 'user)
+                         (set! sharing-users (add1 sharing-users))
+                         ;; Return the closed-over variables of this and the 
rest of the
+                         ;; users in the chain to the container for allocation.
+                         ;; Note that because the user always is a superset of 
the container,
+                         ;; we can just return the "deepest" user
+                         (if (null? sub-closure)
+                             this-closure
+                             sub-closure)))
+
+                      ((test id 'shareable-container)
+                       (let ((sub-closure (merge-shareable (first subs) 
this-closure)))
+                         (unless (null? sub-closure)
+                           ;; NOTE: We don't touch 'captured-variables, 
because the vars
+                           ;; on initial entry of the sharing closures are 
unchanged.
+                           ;; However, we do need to know the full closure
+                           (db-put! db id 'closure-size (length sub-closure))
+                           (db-put! db id 'sharing-mode 'container)
+                           (db-put! db id 'shared-closure sub-closure)
+                           (set! sharing-containers (add1 
sharing-containers))))
+                       ;; This is a new container, so do not allow higher-up 
containers
+                       ;; to collect variables from this closure.
+                       '())
+
+                      (else (merge-shareable (first subs) #f)
+                            '()))))))
+
+         (else (concatenate (map (lambda (n) (merge-shareable n 
shared-closure)) subs)) ) ) ))
+
+
     ;; Create explicit closures:
     (define (transform n crefvar closure)
       (let ((subs (node-subexpressions n))
@@ -2658,10 +2750,13 @@
                        (make-node
                         'let (list var)
                         (list (make-node '##core#box '() (list (varnode 
boxedalias)))
-                              (transform (second subs) crefvar closure) ) ) ) )
+                              (update-shared-closure-var var crefvar closure
+                                                          (transform (second 
subs) crefvar closure))) )) )
                 (make-node
                  'let params
-                 (maptransform subs crefvar closure) ) ) ) )
+                  (list (transform (first subs) crefvar closure)
+                        (update-shared-closure-var var crefvar closure
+                                                   (transform (second subs) 
crefvar closure))) ) ) ) )
 
          ((##core#lambda ##core#direct_lambda)
           (let ((llist (third params)))
@@ -2671,11 +2766,26 @@
                (let* ((boxedvars (filter (lambda (v) (test v 'boxed)) vars))
                       (boxedaliases (map cons boxedvars (map gensym 
boxedvars)))
                       (cvar (gensym 'c))
+                       (all-vars (if rest (cons rest vars) vars))
                       (id (if crefvar (first params) 'toplevel))
-                       (new-crefvar cvar)
-                      (capturedvars (or (test id 'captured-variables) '()))
-                      (csize (or (test id 'closure-size) 0))
-                      (info (and emit-closure-info (second params) (pair? 
llist))) )
+                       (sharing-mode (test id 'sharing-mode))
+                      (capturedvars (if (eq? sharing-mode 'user)
+                                        (list #f) ; NOTE: Hacky way to 
indicate we want to pass on the closure
+                                        (or (test id 'captured-variables) 
'())))
+                      (csize (or (test id 'closure-size) 0)) ; = (length 
new-closure) below
+                      (info (and emit-closure-info (second params) (pair? 
llist)))
+                       (new-crefvar (if (eq? sharing-mode 'user)
+                                        ;; Users should not look up vars in 
their own closure, but in
+                                        ;; the shared closure "container" 
(which is the only entry in their own closure)
+                                        (gensym 'scc)
+                                        cvar))
+                      (new-closure (case sharing-mode
+                                      ((container) (test id 'shared-closure)) 
; Fresh container will hold the full shared closure
+                                     ((user)
+                                       ;; Sharing user doesn't introduce new 
vars into closure, but uses shared container's closure
+                                      closure)
+                                     ;; Normal unshared closure is over its 
captured vars
+                                     (else capturedvars))))
                  ;; If rest-parameter is boxed: mark it as 'boxed-rest
                  ;;  (if we don't do this than preparation will think the 
(boxed) alias
                  ;;  of the rest-parameter is never used)
@@ -2701,21 +2811,39 @@
                             (cond ((and rest (assq rest boxedaliases)) => cdr)
                                   (else rest) ) ) )
                           (fourth params) )
-                    (list (let ((body (transform (car subs) new-crefvar 
capturedvars)))
-                            (if (pair? boxedvars)
-                                (let loop ((aliases (unzip1 boxedaliases))
-                                           (values
-                                            (map (lambda (a)
-                                                   (make-node '##core#box '() 
(list (varnode (cdr a)))))
-                                                 boxedaliases) ))
-                                  (if (null? aliases)
-                                      body
-                                      (make-node 'let (list (car aliases))
-                                                 (list (car values)
-                                                       (loop (cdr aliases) 
(cdr values))))))
-                                body) ) ) )
-                   (let ((cvars (map (lambda (v) (ref-var (varnode v) crefvar 
closure))
-                                     capturedvars) ) )
+                    (list (wrap-crefvar cvar new-crefvar
+                                         (let ((body 
(update-shared-closure-vars all-vars new-crefvar new-closure
+                                                                               
 (transform (car subs) new-crefvar new-closure))))
+                                          (if (pair? boxedvars)
+                                              (let loop ((aliases (unzip1 
boxedaliases))
+                                                         (values
+                                                          (map (lambda (a)
+                                                                 (make-node 
'##core#box '() (list (varnode (cdr a)))))
+                                                               boxedaliases) ))
+                                                (if (null? aliases)
+                                                    body
+                                                    (make-node 'let (list (car 
aliases))
+                                                               (list (car 
values)
+                                                                     (loop 
(cdr aliases) (cdr values))))))
+                                              body) )) ) )
+                   (let ((cvars (map (lambda (v)
+                                        ;; NOTE: This memq redundancy is 
needed because "gather" reorders lexicals
+                                        ;; continually, so the index of the 
variables will differ between each user,
+                                        ;; meaning the collected shared 
closure is differently ordered than the
+                                        ;; capturedvars.  Otherwise, we could 
just map ref-var over capturedvars
+                                        ;; and append a bunch of undefineds at 
the end.
+                                        (cond ((not v) ; See capturedvars note 
above
+                                               (varnode crefvar))
+                                              ((memq v capturedvars)
+                                               ;; If it's a captured var, put 
it in the closure at the appropriate spot
+                                               (ref-var (varnode v) crefvar 
closure))
+                                              (else
+                                               ;; Shared closures have 
reserved spots which users further down will set!
+                                               ;; to a proper value.  Init 
those as undefined.
+                                               (make-node '##core#undefined 
'() '()))))
+                                     (if (eq? sharing-mode 'container)
+                                          new-closure
+                                          capturedvars))))
                      (if info
                          (append
                           cvars
@@ -2773,6 +2901,37 @@
     (define (maptransform xs crefvar closure)
       (map (lambda (x) (transform x crefvar closure)) xs) )
 
+    ;; If the crefvar (used by ref-var et al) differs from the
+    ;; closure's own "cvar" because it's a shared closure, introduce a
+    ;; let binding that obtains the shared closure from the cvar.
+    (define (wrap-crefvar cvar crefvar node)
+      (if (eq? cvar crefvar)
+          node
+          (make-node 'let (list crefvar)
+                     (list (make-node '##core#ref (list 1)
+                                     (list (varnode cvar)) )
+                           node) ) ) )
+
+    ;; If a variable is introduced for the first time, and we're in a
+    ;; sharing user, we have to update its value in the shared container
+    ;; closure if it occurs there, so that further users can see it.
+    (define (update-shared-closure-var var crefvar closure node)
+      (cond ((posq var closure)
+            => (lambda (i)
+                  (make-node 'let (list (gensym var))
+                            (list (make-node '##core#update (list (+ i 1))
+                                             (list (varnode crefvar) (varnode 
var)) )
+                                   node)) ) )
+           (else node) ))
+
+    (define (update-shared-closure-vars vars crefvar closure node)
+      (let lp ((node node)
+               (vars vars))
+        (if (null? vars)
+            node
+            (lp (update-shared-closure-var (car vars) crefvar closure node)
+                (cdr vars)))))
+
     (define (ref-var n crefvar closure)
       (let ((var (first (node-parameters n))))
        (cond ((posq var closure)
@@ -2785,8 +2944,14 @@
     (gather node #f '())
     (when (pair? customizable)
       (debugging 'o "customizable procedures" customizable))
+    (debugging 'p "closure conversion merging of shareables phase...")
+    (merge-shareable node #f)
+    (unless (and (zero? sharing-containers)
+                 (zero? sharing-users)) ;; Users should always be zero if 
containers is (but paranoia prevails, helps w/ debugging)
+      (debugging 'o "shared closure containers" sharing-containers)
+      (debugging 'o "shared closure users" sharing-users))
     (debugging 'p "closure conversion transformation phase...")
-    (let ((node2 (transform node #f #f)))
+    (let ((node2 (transform node #f '())))
       (unless (zero? direct-calls)
        (debugging 'o "calls to known targets" direct-calls))
       node2) ) )
-- 
2.49.0

>From db0226ad1579fcb5345dae662fb09a44b578f035 Mon Sep 17 00:00:00 2001
From: Peter Bex <[email protected]>
Date: Tue, 22 Jul 2025 12:23:02 +0200
Subject: [PATCH 4/5] Undo boxing of variables that don't escape shared
 closures

When a variable is created in a sharing closure container or user and
is *only* used by sharing users further down the line, and *not* any
regular lambdas (or new containers), it is not necessary to box these
variables because the container itself already acts as a box.  In such
cases, we undo the 'boxed property of the variables.
---
 core.scm | 43 ++++++++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/core.scm b/core.scm
index 6746a06a..fd4885c9 100644
--- a/core.scm
+++ b/core.scm
@@ -2532,7 +2532,8 @@
         (sharing-containers 0)
         (sharing-users 0)
        (customizable '())
-       (lexicals '()))
+       (lexicals '())
+        (escaping-shared-vars '()))
 
     (define (test sym item) (db-get db sym item))
 
@@ -2652,8 +2653,7 @@
            (third params)
            (lambda (vars argc rest)
              (let* ((id (first params))
-                     (this-closure (test id 'captured-variables)))
-                ;; TODO: unbox vars that are only referenced inside the shared 
closure
+                     (this-closure (or (test id 'captured-variables) '())))
                 (cond ((and shared-closure
                             (test id 'shareable-user)
                             ;; The user must close over all the shared closure 
vars, otherwise
@@ -2677,20 +2677,37 @@
                              sub-closure)))
 
                       ((test id 'shareable-container)
-                       (let ((sub-closure (merge-shareable (first subs) 
this-closure)))
-                         (unless (null? sub-closure)
-                           ;; NOTE: We don't touch 'captured-variables, 
because the vars
-                           ;; on initial entry of the sharing closures are 
unchanged.
-                           ;; However, we do need to know the full closure
-                           (db-put! db id 'closure-size (length sub-closure))
-                           (db-put! db id 'sharing-mode 'container)
-                           (db-put! db id 'shared-closure sub-closure)
-                           (set! sharing-containers (add1 
sharing-containers))))
+                       ;; If we're starting a new container, any captured vars 
from the
+                       ;; surrounding container will escape, like with a 
non-sharing closure
+                       (set! escaping-shared-vars (lset-union/eq? 
escaping-shared-vars this-closure))
+
+                       (fluid-let ((escaping-shared-vars '()))
+                         (let ((sub-closure (merge-shareable (first subs) 
this-closure)))
+                           (unless (null? sub-closure)
+                             ;; NOTE: We don't touch 'captured-variables, 
because the vars
+                             ;; on initial entry of the sharing closures are 
unchanged.
+                             ;; However, we do need to know the full closure
+                             (db-put! db id 'closure-size (length sub-closure))
+                             (db-put! db id 'sharing-mode 'container)
+                             (db-put! db id 'shared-closure sub-closure)
+                             (set! sharing-containers (add1 
sharing-containers))
+
+                             ;; Shared vars introduced by users which don't 
escape don't have to be boxed
+                             ;; because the shared closure container itself 
already acts as a box.
+                             (let* ((user-introduced-vars (lset-difference/eq? 
sub-closure this-closure))
+                                    (unboxable-vars (lset-difference/eq? 
user-introduced-vars escaping-shared-vars)))
+                               (for-each (lambda (v)
+                                           (when (test v 'boxed) ; Not 
strictly needed, but cleaner this way
+                                             (db-put! db v 'boxed #f)))
+                                         unboxable-vars)))))
                        ;; This is a new container, so do not allow higher-up 
containers
                        ;; to collect variables from this closure.
                        '())
 
-                      (else (merge-shareable (first subs) #f)
+                      ;; All closed-over vars in non-user procedures "escape" 
the container (if any)
+                      ;; and must remain boxed.
+                      (else (set! escaping-shared-vars (lset-union/eq? 
escaping-shared-vars this-closure))
+                            (merge-shareable (first subs) #f)
                             '()))))))
 
          (else (concatenate (map (lambda (n) (merge-shareable n 
shared-closure)) subs)) ) ) ))
-- 
2.49.0

>From ca50747a67a6b2dc0b0a94282a79829ed0384f58 Mon Sep 17 00:00:00 2001
From: Peter Bex <[email protected]>
Date: Wed, 23 Jul 2025 15:07:00 +0200
Subject: [PATCH 5/5] Replace quadratic lset-operations in closure generation
 with linear algorithms

The lset-difference/eq? and delete-duplicates calls in the closure
generation are O(n*m) and O(n^2), respectively.  When the closures are
deeply nested these might be called O(n) times, resulting in cubic
complexity of the entire closure generation step.

Implement hash-table-assisted versions of these procedures specific
for symbols so that we don't run into this problem.
---
 core.scm | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/core.scm b/core.scm
index fd4885c9..b78483de 100644
--- a/core.scm
+++ b/core.scm
@@ -2535,6 +2535,29 @@
        (lexicals '())
         (escaping-shared-vars '()))
 
+    ;; O(n) version of delete-duplicates (which is O(n^2)) specific for symbols
+    (define (delete-duplicate-symbols lst)
+      (let ((seen (make-hash-table)))
+        (let lp ((lst lst)
+                 (result '()))
+          (if (null? lst)
+             (reverse result)
+             (let ((x (car lst)))
+               (cond ((hash-table-ref seen x)
+                      (lp (cdr lst) result))
+                      (else
+                       (hash-table-set! seen x #t)
+                      (lp (cdr lst)
+                           (cons x result)))))))))
+
+    ;; O(n+m) version of lset-difference/eq? (which is O(n*m)) specific for 
symbols
+    (define (symbolset-difference ls . lss)
+      (let ((seen (make-hash-table)))
+        (for-each (lambda (lst)
+                    (for-each (lambda (x) (hash-table-set! seen x #t)) lst))
+                  lss)
+        (remove (lambda (x) (hash-table-ref seen x)) ls)))
+
     (define (test sym item) (db-get db sym item))
 
     (define (register-customizable! var id)
@@ -2628,10 +2651,10 @@
            (lambda (vars argc rest)
              (let ((id (if here (first params) 'toplevel)))
                (fluid-let ((lexicals (append locals lexicals)))
-                 (let ((c (delete-duplicates (gather (first subs) id vars) 
eq?)))
+                 (let ((c (delete-duplicate-symbols (gather (first subs) id 
vars))))
                    (db-put! db id 'closure-size (length c))
                    (db-put! db id 'captured-variables c)
-                   (lset-difference/eq? c locals vars)))))))
+                   (symbolset-difference c locals vars)))))))
 
          (else (concatenate (map (lambda (n) (gather n here locals)) subs)) ) 
) ))
 
@@ -2658,7 +2681,7 @@
                             (test id 'shareable-user)
                             ;; The user must close over all the shared closure 
vars, otherwise
                             ;; we risk extending the lifetime of these vars 
for too long.
-                            (null? (lset-difference/eq? shared-closure 
this-closure))
+                            (null? (symbolset-difference shared-closure 
this-closure))
                             ;; Minimum shared closure size - don't want to 
share a single var, it's extra indirection
                             (> (length this-closure) 1))
                        ;; We only pass on the container to the subs if this is 
also a shareable-container
@@ -2694,8 +2717,8 @@
 
                              ;; Shared vars introduced by users which don't 
escape don't have to be boxed
                              ;; because the shared closure container itself 
already acts as a box.
-                             (let* ((user-introduced-vars (lset-difference/eq? 
sub-closure this-closure))
-                                    (unboxable-vars (lset-difference/eq? 
user-introduced-vars escaping-shared-vars)))
+                             (let* ((user-introduced-vars 
(symbolset-difference sub-closure this-closure))
+                                    (unboxable-vars (symbolset-difference 
user-introduced-vars escaping-shared-vars)))
                                (for-each (lambda (v)
                                            (when (test v 'boxed) ; Not 
strictly needed, but cleaner this way
                                              (db-put! db v 'boxed #f)))
-- 
2.49.0

+---[1]:
|-> installation-prefix: /home/sjamaan/chickens/chicken-6-master
|-> csc-options: 
|-> runtime-options: 
|-> repetitions: 10

+---[2]:
|-> installation-prefix: /home/sjamaan/chickens/chicken-6-shared-closures
|-> csc-options: 
|-> runtime-options: 
|-> repetitions: 10

===
=== Overall results
===

=== [cpu-time]
[0]: 1.0 (461.07s)
[1]: 1.01 (456.16s)


===
=== Results by metric
===


In the tables below results are normalized (larger numbers indicate better 
results).

=== [cpu-time]

Programs                   [1]       [2]
========================================
0__________________________1.0_______1.0
binarytrees_______________1.01_______1.0
boyer______________________1.0______1.01
browse____________________1.01_______1.0
conform____________________1.0______1.06
cpstak_____________________1.0______1.01
ctak_______________________1.0______1.04
dderiv____________________1.02_______1.0
deriv_____________________1.01_______1.0
destructive________________1.0______1.04
dfa_______________________1.08_______1.0
div-iter__________________1.02_______1.0
div-rec____________________1.1_______1.0
dynamic___________________1.16_______1.0
earley____________________1.07_______1.0
fannkuch___________________1.0______1.09
fft________________________1.0______1.12
fib________________________1.0______1.11
fibc_______________________1.0______1.13
fibfp______________________1.0______1.08
fprint_____________________1.0______1.17
fread______________________1.0______1.02
gcbench____________________1.0______1.03
gold_______________________1.0______1.01
gold2______________________1.0_______1.0
graphs_____________________1.0______1.08
hanoi______________________1.0______1.06
integ______________________1.0______1.01
integ2_____________________1.0______1.06
kanren_____________________1.0_______1.0
kernwyk-ackermann__________1.0______1.05
kernwyk-array______________1.0______1.02
kernwyk-cat________________1.0______1.49
kernwyk-string_____________1.0_______1.0
kernwyk-sum_______________1.04_______1.0
kernwyk-tail______________1.02_______1.0
kernwyk-wc_________________1.0______1.04
knucleotide_______________1.09_______1.0
lattice___________________1.01_______1.0
matrix_____________________1.0_______1.0
maze_______________________1.0_______1.0
mazefun____________________1.0_______1.0
mbrot______________________1.0______1.05
nbody______________________1.0______1.01
nboyer____________________1.01_______1.0
nestedloop________________1.16_______1.0
nfa_______________________1.05_______1.0
nqueens___________________1.04_______1.0
ntakl_____________________1.07_______1.0
nucleic2___________________1.0______1.02
paraffins__________________1.0______1.02
parsing___________________1.01_______1.0
pnpoly_____________________1.0______1.06
primes_____________________1.0_______1.0
psyntax___________________1.02_______1.0
puzzle____________________1.03_______1.0
ray_______________________1.02_______1.0
ray2_______________________1.0______1.02
rho3rec____________________1.0______1.02
sboyer____________________1.02_______1.0
scheme____________________1.02_______1.0
sieves-eratosthenes_______1.05_______1.0
simplex___________________1.04_______1.0
slatex____________________1.03_______1.0
sort1______________________1.0_______1.0
tak________________________1.0______1.03
takl______________________1.02_______1.0
takr_______________________1.0______1.05
traverse___________________1.0______1.03
travinit__________________1.02_______1.0
triangl___________________1.01_______1.0

[PATCH] Introduce "closure sharing" optimization

Reply via email to