[note: this is a long mail, apologies for that] Hi all,
The attached patches introduce a new optimization. It can also be found in the "shared-closures" branch in git. We ran Salmonella on it the past few days, and it seems to work perfectly: http://salmonella-linux-x86-64.call-cc.org/shared-closures/gcc/linux/x86-64/2025/07/ The optimization should improve performance by not requiring all closed-over variables to be copied every time we create a new closure. Instead, we simply take the surrounding closure (the "container") and wrap it in the closure (the "user") so that the user can access the closed-over variables from the container. Any closure created inside the user may then also receive the container directly, transitively down the "chain" of nested closures. If a "user" introduces new variables which nested closures can see, we simply mutate the container closure in place to hold that variable, for which we reserve a spot when initially creating the container. Added advantage: if the variable *only* exists inside a shared chain, we can avoid boxing it because the container already acts as a box. We can obviously only do this under restricted situations, because closed-over variables may get different values on different invocations of the same lambda, which means they'd get clobbered and any closures that need to reference to original value would see the updated value instead. It's the "broken JavaScript lambda-creating for loop" problem all over again. Another complicating factor is that we don't want out-of-scope values to be held onto by inner lambdas, because that would extend their lifetime beyond what is necessary. One of the great aspects of CHICKEN is that it has very strict variable lifetimes due to its flat closure representation. We don't want to throw that away! The current implementation is extremely (probably overly) cautious and it only allows closure sharing among lambdas where: - None of the lambdas in a "shared closure chain" escape, - each lambda in the chain creates at most one other lambda, - each lambda closes over a superset of the defining lambda's closed-over variables (so the closures only "grow" as you go down the nesting) - and the lambda may not be a foreign callback My gut feeling says we can probably relax a few of these restrictions, but it's kind of hard to oversee exactly how features interact. For example, first I thought containers might escape (which allows for a great many more opportunities for this optimization), but it appears that when a lambda is assigned to a global and it invokes itself recursively, the number of call sites isn't counted correctly towards that internal lambda (possibly because the global may be set! to something else?), which means it's basically impossible to detect self-calls (which may not be shared, as that invalidates the rule that variables may be assigned only once). Threading might also be a problem since a continuation could be suspended and then the same closure invoked elsewhere which would clobber shared variables in the suspended closure. So for now it's best to only allow "internal" procedures and those which we know do not escape. Unfortunately, that drastically reduces its impact. But the upshot of all of this is that the problem in #1852 is now thoroughly avoided because the chain of closures created as part of evaluating the arguments to a call fits the above criteria exactly. The nice thing is that it is a general solution that applies to any program, without affecting semantics. The benchmark suite seems mostly unaffected (see the attached file benchmark-comparison.txt), but it improves a few programs' performance. Notably kernwyk-cat. It's a cool example because it's a simple program which illustrates the concept well: (define (catport in out) (let ((x (read-char in))) (if (not (eof-object? x)) (begin (write-char x out) ;; closure container (catport in out))))) ;; closure user During CPS conversion, the write-char call gets wrapped in a lambda which closes over "in" and "out", and the recursive call to catport also gets wrapped in a lambda which closes over those same variables, so the optimization kicks in and the former is assigned to be a container and the latter a user, so we can create a closure with only one variable slot instead of two. It doesn't seem like much, but this means the stack consumption is reduced in this tight loop so we end up doing less GCs. As for the example in #1852, the original implementation in master took 54 minutes to translate to C, which resulted in a 329MB file. Then, gcc took 284 minutes to compile this, resulting in a 181MB binary. The runtime of the binary is 0.064s real and 0.033s user. With the implementation in this branch, it takes 1.5 minute to translate to C, resulting in a 5.1MB file. Gcc takes 14 seconds to compile, resulting in a 5.4MB binary. Runtime of the binary is half that of the original: 0.031s real and 0.14s user. Note that most of the translation to C time seems to be eaten up by the quadratic complexity of lset-difference/eq? and remove-duplicates. I've added new procedures which use a hash table to avoid traversing the same list over and over. Note: the first patch is actually unrelated, but I noticed this while I was looking at the analysis code. The second patch is just to keep the third (main) one more readable. Cheers, Peter
>From a34e7e4d92e02e534f426b416772248d7210baa7 Mon Sep 17 00:00:00 2001 From: Peter Bex <[email protected]> Date: Thu, 10 Jul 2025 10:03:44 +0200 Subject: [PATCH 1/5] Remove bogus safety measure The measure used to read (unless (eq? val n) (bomb ...)), which meant that we should not allow captured values to be rest ops. However, with 32067934f we now do allow rest ops to refer to captured variables. That commit also changed the safety check, turning it into a tautology - the val will always either be direct, or a reference to a closure var. So it makes no sense to even keep the check. --- core.scm | 4 ---- 1 file changed, 4 deletions(-) diff --git a/core.scm b/core.scm index 0459a214..2ff2396f 100644 --- a/core.scm +++ b/core.scm @@ -2625,10 +2625,6 @@ ((##core#rest-cdr ##core#rest-car ##core#rest-null? ##core#rest-length) (let* ((val (ref-var n here closure)) (rest-var (if (eq? val n) (varnode (first params)) val))) - (unless (or (eq? val n) - (match-node val `(##core#ref (i) (##core#variable (,here))) '(i))) - ;; If it's captured, replacement in optimizer was incorrect - (bomb "Saw rest op for captured variable. This should not happen!" class) ) ;; If rest-cdrs have not all been eliminated, restore ;; them as regular cdr calls on the rest list variable. ;; This can be improved, as it can actually introduce -- 2.49.0
>From e4c49ac29c35d4c013ca3a1d72118ecb58daff13 Mon Sep 17 00:00:00 2001 From: Peter Bex <[email protected]> Date: Fri, 18 Jul 2025 15:44:50 +0200 Subject: [PATCH 2/5] Naming change in closure "transform" procedure Change the name of the second argument in the "transform" procedure from "here" to "crefvar". The name "here" was a bit misleading, as "here" meant the lambda id in both "gather" and "walk-global" in "prepare-for-code-generation", while it referred to the closure variable ("cvar") in "transform". This inconsistency was a bit confusing when initially trying to understand the code. Besides being a bit of a nitpicky improvement, this will pave the way for the next commit, which will differentiate between the actual closure variable (i.e., "cvar") and the variable used to access variables inside the closure (i.e., "crefvar"). But for the current commit, the value of "crefvar" is always the same as "cvar" which is generated when converting the lambda to a closure literal. --- core.scm | 45 +++++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/core.scm b/core.scm index 2ff2396f..9e041a38 100644 --- a/core.scm +++ b/core.scm @@ -2605,7 +2605,7 @@ (else (concatenate (map (lambda (n) (gather n here locals)) subs)) ) ) )) ;; Create explicit closures: - (define (transform n here closure) + (define (transform n crefvar closure) (let ((subs (node-subexpressions n)) (params (node-parameters n)) (class (node-class n)) ) @@ -2617,25 +2617,25 @@ ((##core#variable) (let* ((var (first params)) - (val (ref-var n here closure)) ) + (val (ref-var n crefvar closure)) ) (if (test var 'boxed) (make-node '##core#unbox '() (list val)) val) ) ) ((##core#rest-cdr ##core#rest-car ##core#rest-null? ##core#rest-length) - (let* ((val (ref-var n here closure)) + (let* ((val (ref-var n crefvar closure)) (rest-var (if (eq? val n) (varnode (first params)) val))) ;; If rest-cdrs have not all been eliminated, restore ;; them as regular cdr calls on the rest list variable. ;; This can be improved, as it can actually introduce ;; many more cdr calls than necessary. (cond ((eq? class '##core#rest-cdr) - (transform (replace-rest-op-with-list-ops class rest-var params) here closure)) + (transform (replace-rest-op-with-list-ops class rest-var params) crefvar closure)) ;; If n isn't val, this node was processed and the ;; variable got replaced by a closure access. ((not (eq? val n)) - (transform (replace-rest-op-with-list-ops class rest-var params) here closure)) + (transform (replace-rest-op-with-list-ops class rest-var params) crefvar closure)) (else val)) ) ) @@ -2645,7 +2645,7 @@ ##core#let_float ##core#box_float ##core#unbox_float ##core#inline_loc_ref ##core#inline_loc_update) - (make-node (node-class n) params (maptransform subs here closure)) ) + (make-node (node-class n) params (maptransform subs crefvar closure)) ) ((let) (let* ([var (first params)] @@ -2654,14 +2654,14 @@ (if boxedvar (make-node 'let (list boxedalias) - (list (transform (first subs) here closure) + (list (transform (first subs) crefvar closure) (make-node 'let (list var) (list (make-node '##core#box '() (list (varnode boxedalias))) - (transform (second subs) here closure) ) ) ) ) + (transform (second subs) crefvar closure) ) ) ) ) (make-node 'let params - (maptransform subs here closure) ) ) ) ) + (maptransform subs crefvar closure) ) ) ) ) ((##core#lambda ##core#direct_lambda) (let ((llist (third params))) @@ -2671,7 +2671,8 @@ (let* ((boxedvars (filter (lambda (v) (test v 'boxed)) vars)) (boxedaliases (map cons boxedvars (map gensym boxedvars))) (cvar (gensym 'c)) - (id (if here (first params) 'toplevel)) + (id (if crefvar (first params) 'toplevel)) + (new-crefvar cvar) (capturedvars (or (test id 'captured-variables) '())) (csize (or (test id 'closure-size) 0)) (info (and emit-closure-info (second params) (pair? llist))) ) @@ -2700,7 +2701,7 @@ (cond ((and rest (assq rest boxedaliases)) => cdr) (else rest) ) ) ) (fourth params) ) - (list (let ((body (transform (car subs) cvar capturedvars))) + (list (let ((body (transform (car subs) new-crefvar capturedvars))) (if (pair? boxedvars) (let loop ((aliases (unzip1 boxedaliases)) (values @@ -2713,7 +2714,7 @@ (list (car values) (loop (cdr aliases) (cdr values)))))) body) ) ) ) - (let ((cvars (map (lambda (v) (ref-var (varnode v) here closure)) + (let ((cvars (map (lambda (v) (ref-var (varnode v) crefvar closure)) capturedvars) ) ) (if info (append @@ -2739,23 +2740,23 @@ (make-node (if immf '##core#updatebox_i '##core#updatebox) '() - (list (make-node '##core#ref (list (add1 i)) (list (varnode here))) - (transform val here closure) ) ) + (list (make-node '##core#ref (list (add1 i)) (list (varnode crefvar))) + (transform val crefvar closure) ) ) ;; Is the following actually used??? (make-node (if immf '##core#update_i '##core#update) (list (add1 i)) - (list (varnode here) - (transform val here closure) ) ) ) ) ) + (list (varnode crefvar) + (transform val crefvar closure) ) ) ) ) ) ((test var 'boxed) (make-node (if immf '##core#updatebox_i '##core#updatebox) '() (list (varnode var) - (transform val here closure) ) ) ) + (transform val crefvar closure) ) ) ) (else (make-node 'set! (list var immf) - (list (transform val here closure) ) ) ) ) ) ) + (list (transform val crefvar closure) ) ) ) ) ) ) ((##core#primitive) (make-node @@ -2769,15 +2770,15 @@ (else (bomb "bad node (closure2)")) ) ) ) - (define (maptransform xs here closure) - (map (lambda (x) (transform x here closure)) xs) ) + (define (maptransform xs crefvar closure) + (map (lambda (x) (transform x crefvar closure)) xs) ) - (define (ref-var n here closure) + (define (ref-var n crefvar closure) (let ((var (first (node-parameters n)))) (cond ((posq var closure) => (lambda (i) (make-node '##core#ref (list (+ i 1)) - (list (varnode here)) ) ) ) + (list (varnode crefvar)) ) ) ) (else n) ) ) ) (debugging 'p "closure conversion gathering phase...") -- 2.49.0
>From 8764d3903f5930bb8f635367c0e23d3958cb4652 Mon Sep 17 00:00:00 2001 From: Peter Bex <[email protected]> Date: Fri, 18 Jul 2025 15:11:47 +0200 Subject: [PATCH 3/5] Add support for sharing closures to the compiler Normally, CHICKEN uses strictly flat closures, which means that each and every closed-over variable is wrapped in the closure object together with the translated lambda's C function. Unfortunately, this can lead to a whole lot of copying if the lambda calls a lot of CPS functions. Each CPS call requires breaking up the lambda into multiple different lambdas, each of which get their own C function and closure object. This means that a big Scheme closure which gets split up into multiple low-level CPS closures like this will keep copying the same variables over and over, wrapping them into a new closure, calling the CPS function, which then "returns" to that closure, which wraps another closure, calls another CPS function etc etc. This can result in quite pathological behaviour, like reported in #1852. The C compiler also has trouble dealing with this amount of generated code. One could use "linked" closures which simply point to their containing function, but that can be somewhat slow in lookup and will extend variables' lifetime for longer than necessary, which has its own problems. Instead, we now create a big "container" closure when calling the initial function, with some slots initialized as undefined. This container closure is then passed around to other translated lambdas ("users") which can access the variables from that container. When the function which defines the variable runs, it adds its variable to the closure at the spot reserved for it. Subsequent "users" can then read it from the container as if they closed over it. This means there is no copying going on - the same container is always passed to the next closure. The container is the only closed-over variable inside a sharing "user" closure. A chain of shared closures always starts with a "container" closure, which closes over any number of variables. Each subsequent user to which the container is passed must strictly extend the set of closed-over variables. If any variable is dropped in a lambda, we disallow that lambda from being a user. Instead, it's either a regular closure or a container which starts a new sharing chain. We also have to take care that sharing users and containers are never invoked more than once with the same closure object. This would lead to variables being clobbered. Effectively, this means containers and users are only created for non-escaping procedures which don't have any mutual (direct or indirect) recursion going on. Also, we disallow containers or users from defining more than one sub-lambda, because that would invalidate the requirement that closures must always be grown. Strictly speaking, it may be possible for a shared closure to be re-used in multiple users, but it's hard to reason about the situations in which this would be safe and would complicate the code too much. Note that if a continuation is reified with call/cc, the argument to call/cc is considered an escaping function, so it will never be marked as a user or container. This means it should be impossible to observe these container slots changing around by capturing and reinvoking a continuation (as per letrec). --- batch-driver.scm | 9 +- core.scm | 209 ++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 193 insertions(+), 25 deletions(-) diff --git a/batch-driver.scm b/batch-driver.scm index c5012c30..6b0e8fd2 100644 --- a/batch-driver.scm +++ b/batch-driver.scm @@ -117,7 +117,8 @@ (undefined . und) (replacing . rpg) (unused . uud) (extended-binding . xtb) (inline-export . ilx) (hidden-refs . hrf) (value-ref . vvf) - (customizable . cst) (has-unused-parameters . hup) (boxed-rest . bxr) ) ) + (customizable . cst) (has-unused-parameters . hup) (boxed-rest . bxr) + (shareable-container . shc) (shareable-user . shu) ) ) (omit #f)) (lambda (db) (unless omit @@ -141,7 +142,8 @@ (case (caar es) ((captured assigned boxed global contractable standard-binding assigned-locally collapsable removable undefined replacing unused simple inlinable inline-export - has-unused-parameters extended-binding customizable constant boxed-rest hidden-refs) + has-unused-parameters extended-binding customizable constant boxed-rest hidden-refs + shareable-container shareable-user) (printf "\t~a" (cdr (assq (caar es) names))) ) ((unknown) (set! val 'unknown) ) @@ -152,7 +154,8 @@ ((potential-values) (set! pvals (cdar es))) ((replacable home contains contained-in use-expr closure-size rest-parameter - captured-variables explicit-rest rest-cdr rest-null? consed-rest-arg) + captured-variables explicit-rest rest-cdr rest-null? consed-rest-arg + shared-closure sharing-mode) (printf "\t~a=~s" (caar es) (cdar es)) ) ((derived-rest-vars) (set! derived-rvars (cdar es))) diff --git a/core.scm b/core.scm index 9e041a38..6746a06a 100644 --- a/core.scm +++ b/core.scm @@ -285,6 +285,10 @@ ; explicit-rest -> <boolean> If true: procedure is called with consed rest list ; captured-variables -> (<var> ...) List of closed over variables ; inline-target -> <boolean> If true: was target of an inlining operation +; shareable-container -> <boolean> If true: potentially may collect and share closed-over variables from (nested) contained closures +; shareable-user -> <boolean> If true: closed-over variables may potentially be shared from the containing closure +; sharing-mode -> <container|user> If container: actually collects and shares closed-over variables from (nested) contained closures. If user: receives container closure +; shared-closure -> (<var> ...) List of transitively closed over variables of the sharing-container and its sharing-user (declare @@ -2422,6 +2426,30 @@ (db-put! db (first lparams) 'explicit-rest #t) (db-put! db rest 'consed-rest-arg #t) ) ) ) ) ) ) ) ) ) + ;; If it has a known or local value which is a procedure, and referenced only once + ;; and only one call site or is an internal procedure, mark it as 'shareable-user so that + ;; its closed over variables may be shared with its containing procedure. + ;; Note that callbacks are exempt from this, because callback_wrapper creates an empty closure + ;; manually, throwing away our carefully crafted closure. TODO: can maybe be done better? + ;; + ;; If furthermore it only contains a single other procedure, mark it as 'shareable-container + ;; so that may share closed-over variables with that one procedure. + (and-let* ((val (or local-value value)) + ((eq? '##core#lambda (node-class val))) + (lparams (node-parameters val)) + ((or (= 1 nreferences ncall-sites) + (not (second lparams)))) + ((not (rassoc sym callback-names eq?)))) + (db-put! db (first lparams) 'shareable-user #t) + (and-let* ((id (first lparams)) + (contains (or (db-get db id 'contains) '())) + ((= (length contains) 1))) + ;; TODO: It should be possible to have escaping / global procedures be containers, but + ;; they should not call themselves because then they might be setting variables in + ;; the closure to different values at different times. So for now we're extra careful + ;; about which are containers. + (db-put! db (first lparams) 'shareable-container #t))) + ;; Make 'removable, if it has no references and is not assigned to, and one of the following: ;; - it has either a value that does not cause any side-effects ;; - it is 'undefined @@ -2501,6 +2529,8 @@ (define (perform-closure-conversion node db) (let ((direct-calls 0) + (sharing-containers 0) + (sharing-users 0) (customizable '()) (lexicals '())) @@ -2604,6 +2634,68 @@ (else (concatenate (map (lambda (n) (gather n here locals)) subs)) ) ) )) + + ;; Merge shareable closures. This allocates space for closed-over + ;; variables of the longest unbroken line of sharing-users in the + ;; sharing-container, mutating the database entries set up by + ;; "gather" to account for this. + (define (merge-shareable n shared-closure) + (let ((subs (node-subexpressions n)) + (params (node-parameters n)) ) + (case (node-class n) + + ((quote ##core#undefined ##core#provide ##core#proc ##core#primitive) + '()) + + ((##core#lambda ##core#direct_lambda) + (##sys#decompose-lambda-list + (third params) + (lambda (vars argc rest) + (let* ((id (first params)) + (this-closure (test id 'captured-variables))) + ;; TODO: unbox vars that are only referenced inside the shared closure + (cond ((and shared-closure + (test id 'shareable-user) + ;; The user must close over all the shared closure vars, otherwise + ;; we risk extending the lifetime of these vars for too long. + (null? (lset-difference/eq? shared-closure this-closure)) + ;; Minimum shared closure size - don't want to share a single var, it's extra indirection + (> (length this-closure) 1)) + ;; We only pass on the container to the subs if this is also a shareable-container + (let ((sub-closure (merge-shareable (first subs) (and (test id 'shareable-container) this-closure)))) + ;; Reset captured vars. This closure only captures the container + (db-put! db id 'closure-size 1) + (db-put! db id 'captured-variables '()) + (db-put! db id 'sharing-mode 'user) + (set! sharing-users (add1 sharing-users)) + ;; Return the closed-over variables of this and the rest of the + ;; users in the chain to the container for allocation. + ;; Note that because the user always is a superset of the container, + ;; we can just return the "deepest" user + (if (null? sub-closure) + this-closure + sub-closure))) + + ((test id 'shareable-container) + (let ((sub-closure (merge-shareable (first subs) this-closure))) + (unless (null? sub-closure) + ;; NOTE: We don't touch 'captured-variables, because the vars + ;; on initial entry of the sharing closures are unchanged. + ;; However, we do need to know the full closure + (db-put! db id 'closure-size (length sub-closure)) + (db-put! db id 'sharing-mode 'container) + (db-put! db id 'shared-closure sub-closure) + (set! sharing-containers (add1 sharing-containers)))) + ;; This is a new container, so do not allow higher-up containers + ;; to collect variables from this closure. + '()) + + (else (merge-shareable (first subs) #f) + '())))))) + + (else (concatenate (map (lambda (n) (merge-shareable n shared-closure)) subs)) ) ) )) + + ;; Create explicit closures: (define (transform n crefvar closure) (let ((subs (node-subexpressions n)) @@ -2658,10 +2750,13 @@ (make-node 'let (list var) (list (make-node '##core#box '() (list (varnode boxedalias))) - (transform (second subs) crefvar closure) ) ) ) ) + (update-shared-closure-var var crefvar closure + (transform (second subs) crefvar closure))) )) ) (make-node 'let params - (maptransform subs crefvar closure) ) ) ) ) + (list (transform (first subs) crefvar closure) + (update-shared-closure-var var crefvar closure + (transform (second subs) crefvar closure))) ) ) ) ) ((##core#lambda ##core#direct_lambda) (let ((llist (third params))) @@ -2671,11 +2766,26 @@ (let* ((boxedvars (filter (lambda (v) (test v 'boxed)) vars)) (boxedaliases (map cons boxedvars (map gensym boxedvars))) (cvar (gensym 'c)) + (all-vars (if rest (cons rest vars) vars)) (id (if crefvar (first params) 'toplevel)) - (new-crefvar cvar) - (capturedvars (or (test id 'captured-variables) '())) - (csize (or (test id 'closure-size) 0)) - (info (and emit-closure-info (second params) (pair? llist))) ) + (sharing-mode (test id 'sharing-mode)) + (capturedvars (if (eq? sharing-mode 'user) + (list #f) ; NOTE: Hacky way to indicate we want to pass on the closure + (or (test id 'captured-variables) '()))) + (csize (or (test id 'closure-size) 0)) ; = (length new-closure) below + (info (and emit-closure-info (second params) (pair? llist))) + (new-crefvar (if (eq? sharing-mode 'user) + ;; Users should not look up vars in their own closure, but in + ;; the shared closure "container" (which is the only entry in their own closure) + (gensym 'scc) + cvar)) + (new-closure (case sharing-mode + ((container) (test id 'shared-closure)) ; Fresh container will hold the full shared closure + ((user) + ;; Sharing user doesn't introduce new vars into closure, but uses shared container's closure + closure) + ;; Normal unshared closure is over its captured vars + (else capturedvars)))) ;; If rest-parameter is boxed: mark it as 'boxed-rest ;; (if we don't do this than preparation will think the (boxed) alias ;; of the rest-parameter is never used) @@ -2701,21 +2811,39 @@ (cond ((and rest (assq rest boxedaliases)) => cdr) (else rest) ) ) ) (fourth params) ) - (list (let ((body (transform (car subs) new-crefvar capturedvars))) - (if (pair? boxedvars) - (let loop ((aliases (unzip1 boxedaliases)) - (values - (map (lambda (a) - (make-node '##core#box '() (list (varnode (cdr a))))) - boxedaliases) )) - (if (null? aliases) - body - (make-node 'let (list (car aliases)) - (list (car values) - (loop (cdr aliases) (cdr values)))))) - body) ) ) ) - (let ((cvars (map (lambda (v) (ref-var (varnode v) crefvar closure)) - capturedvars) ) ) + (list (wrap-crefvar cvar new-crefvar + (let ((body (update-shared-closure-vars all-vars new-crefvar new-closure + (transform (car subs) new-crefvar new-closure)))) + (if (pair? boxedvars) + (let loop ((aliases (unzip1 boxedaliases)) + (values + (map (lambda (a) + (make-node '##core#box '() (list (varnode (cdr a))))) + boxedaliases) )) + (if (null? aliases) + body + (make-node 'let (list (car aliases)) + (list (car values) + (loop (cdr aliases) (cdr values)))))) + body) )) ) ) + (let ((cvars (map (lambda (v) + ;; NOTE: This memq redundancy is needed because "gather" reorders lexicals + ;; continually, so the index of the variables will differ between each user, + ;; meaning the collected shared closure is differently ordered than the + ;; capturedvars. Otherwise, we could just map ref-var over capturedvars + ;; and append a bunch of undefineds at the end. + (cond ((not v) ; See capturedvars note above + (varnode crefvar)) + ((memq v capturedvars) + ;; If it's a captured var, put it in the closure at the appropriate spot + (ref-var (varnode v) crefvar closure)) + (else + ;; Shared closures have reserved spots which users further down will set! + ;; to a proper value. Init those as undefined. + (make-node '##core#undefined '() '())))) + (if (eq? sharing-mode 'container) + new-closure + capturedvars)))) (if info (append cvars @@ -2773,6 +2901,37 @@ (define (maptransform xs crefvar closure) (map (lambda (x) (transform x crefvar closure)) xs) ) + ;; If the crefvar (used by ref-var et al) differs from the + ;; closure's own "cvar" because it's a shared closure, introduce a + ;; let binding that obtains the shared closure from the cvar. + (define (wrap-crefvar cvar crefvar node) + (if (eq? cvar crefvar) + node + (make-node 'let (list crefvar) + (list (make-node '##core#ref (list 1) + (list (varnode cvar)) ) + node) ) ) ) + + ;; If a variable is introduced for the first time, and we're in a + ;; sharing user, we have to update its value in the shared container + ;; closure if it occurs there, so that further users can see it. + (define (update-shared-closure-var var crefvar closure node) + (cond ((posq var closure) + => (lambda (i) + (make-node 'let (list (gensym var)) + (list (make-node '##core#update (list (+ i 1)) + (list (varnode crefvar) (varnode var)) ) + node)) ) ) + (else node) )) + + (define (update-shared-closure-vars vars crefvar closure node) + (let lp ((node node) + (vars vars)) + (if (null? vars) + node + (lp (update-shared-closure-var (car vars) crefvar closure node) + (cdr vars))))) + (define (ref-var n crefvar closure) (let ((var (first (node-parameters n)))) (cond ((posq var closure) @@ -2785,8 +2944,14 @@ (gather node #f '()) (when (pair? customizable) (debugging 'o "customizable procedures" customizable)) + (debugging 'p "closure conversion merging of shareables phase...") + (merge-shareable node #f) + (unless (and (zero? sharing-containers) + (zero? sharing-users)) ;; Users should always be zero if containers is (but paranoia prevails, helps w/ debugging) + (debugging 'o "shared closure containers" sharing-containers) + (debugging 'o "shared closure users" sharing-users)) (debugging 'p "closure conversion transformation phase...") - (let ((node2 (transform node #f #f))) + (let ((node2 (transform node #f '()))) (unless (zero? direct-calls) (debugging 'o "calls to known targets" direct-calls)) node2) ) ) -- 2.49.0
>From db0226ad1579fcb5345dae662fb09a44b578f035 Mon Sep 17 00:00:00 2001 From: Peter Bex <[email protected]> Date: Tue, 22 Jul 2025 12:23:02 +0200 Subject: [PATCH 4/5] Undo boxing of variables that don't escape shared closures When a variable is created in a sharing closure container or user and is *only* used by sharing users further down the line, and *not* any regular lambdas (or new containers), it is not necessary to box these variables because the container itself already acts as a box. In such cases, we undo the 'boxed property of the variables. --- core.scm | 43 ++++++++++++++++++++++++++++++------------- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/core.scm b/core.scm index 6746a06a..fd4885c9 100644 --- a/core.scm +++ b/core.scm @@ -2532,7 +2532,8 @@ (sharing-containers 0) (sharing-users 0) (customizable '()) - (lexicals '())) + (lexicals '()) + (escaping-shared-vars '())) (define (test sym item) (db-get db sym item)) @@ -2652,8 +2653,7 @@ (third params) (lambda (vars argc rest) (let* ((id (first params)) - (this-closure (test id 'captured-variables))) - ;; TODO: unbox vars that are only referenced inside the shared closure + (this-closure (or (test id 'captured-variables) '()))) (cond ((and shared-closure (test id 'shareable-user) ;; The user must close over all the shared closure vars, otherwise @@ -2677,20 +2677,37 @@ sub-closure))) ((test id 'shareable-container) - (let ((sub-closure (merge-shareable (first subs) this-closure))) - (unless (null? sub-closure) - ;; NOTE: We don't touch 'captured-variables, because the vars - ;; on initial entry of the sharing closures are unchanged. - ;; However, we do need to know the full closure - (db-put! db id 'closure-size (length sub-closure)) - (db-put! db id 'sharing-mode 'container) - (db-put! db id 'shared-closure sub-closure) - (set! sharing-containers (add1 sharing-containers)))) + ;; If we're starting a new container, any captured vars from the + ;; surrounding container will escape, like with a non-sharing closure + (set! escaping-shared-vars (lset-union/eq? escaping-shared-vars this-closure)) + + (fluid-let ((escaping-shared-vars '())) + (let ((sub-closure (merge-shareable (first subs) this-closure))) + (unless (null? sub-closure) + ;; NOTE: We don't touch 'captured-variables, because the vars + ;; on initial entry of the sharing closures are unchanged. + ;; However, we do need to know the full closure + (db-put! db id 'closure-size (length sub-closure)) + (db-put! db id 'sharing-mode 'container) + (db-put! db id 'shared-closure sub-closure) + (set! sharing-containers (add1 sharing-containers)) + + ;; Shared vars introduced by users which don't escape don't have to be boxed + ;; because the shared closure container itself already acts as a box. + (let* ((user-introduced-vars (lset-difference/eq? sub-closure this-closure)) + (unboxable-vars (lset-difference/eq? user-introduced-vars escaping-shared-vars))) + (for-each (lambda (v) + (when (test v 'boxed) ; Not strictly needed, but cleaner this way + (db-put! db v 'boxed #f))) + unboxable-vars))))) ;; This is a new container, so do not allow higher-up containers ;; to collect variables from this closure. '()) - (else (merge-shareable (first subs) #f) + ;; All closed-over vars in non-user procedures "escape" the container (if any) + ;; and must remain boxed. + (else (set! escaping-shared-vars (lset-union/eq? escaping-shared-vars this-closure)) + (merge-shareable (first subs) #f) '())))))) (else (concatenate (map (lambda (n) (merge-shareable n shared-closure)) subs)) ) ) )) -- 2.49.0
>From ca50747a67a6b2dc0b0a94282a79829ed0384f58 Mon Sep 17 00:00:00 2001 From: Peter Bex <[email protected]> Date: Wed, 23 Jul 2025 15:07:00 +0200 Subject: [PATCH 5/5] Replace quadratic lset-operations in closure generation with linear algorithms The lset-difference/eq? and delete-duplicates calls in the closure generation are O(n*m) and O(n^2), respectively. When the closures are deeply nested these might be called O(n) times, resulting in cubic complexity of the entire closure generation step. Implement hash-table-assisted versions of these procedures specific for symbols so that we don't run into this problem. --- core.scm | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/core.scm b/core.scm index fd4885c9..b78483de 100644 --- a/core.scm +++ b/core.scm @@ -2535,6 +2535,29 @@ (lexicals '()) (escaping-shared-vars '())) + ;; O(n) version of delete-duplicates (which is O(n^2)) specific for symbols + (define (delete-duplicate-symbols lst) + (let ((seen (make-hash-table))) + (let lp ((lst lst) + (result '())) + (if (null? lst) + (reverse result) + (let ((x (car lst))) + (cond ((hash-table-ref seen x) + (lp (cdr lst) result)) + (else + (hash-table-set! seen x #t) + (lp (cdr lst) + (cons x result))))))))) + + ;; O(n+m) version of lset-difference/eq? (which is O(n*m)) specific for symbols + (define (symbolset-difference ls . lss) + (let ((seen (make-hash-table))) + (for-each (lambda (lst) + (for-each (lambda (x) (hash-table-set! seen x #t)) lst)) + lss) + (remove (lambda (x) (hash-table-ref seen x)) ls))) + (define (test sym item) (db-get db sym item)) (define (register-customizable! var id) @@ -2628,10 +2651,10 @@ (lambda (vars argc rest) (let ((id (if here (first params) 'toplevel))) (fluid-let ((lexicals (append locals lexicals))) - (let ((c (delete-duplicates (gather (first subs) id vars) eq?))) + (let ((c (delete-duplicate-symbols (gather (first subs) id vars)))) (db-put! db id 'closure-size (length c)) (db-put! db id 'captured-variables c) - (lset-difference/eq? c locals vars))))))) + (symbolset-difference c locals vars))))))) (else (concatenate (map (lambda (n) (gather n here locals)) subs)) ) ) )) @@ -2658,7 +2681,7 @@ (test id 'shareable-user) ;; The user must close over all the shared closure vars, otherwise ;; we risk extending the lifetime of these vars for too long. - (null? (lset-difference/eq? shared-closure this-closure)) + (null? (symbolset-difference shared-closure this-closure)) ;; Minimum shared closure size - don't want to share a single var, it's extra indirection (> (length this-closure) 1)) ;; We only pass on the container to the subs if this is also a shareable-container @@ -2694,8 +2717,8 @@ ;; Shared vars introduced by users which don't escape don't have to be boxed ;; because the shared closure container itself already acts as a box. - (let* ((user-introduced-vars (lset-difference/eq? sub-closure this-closure)) - (unboxable-vars (lset-difference/eq? user-introduced-vars escaping-shared-vars))) + (let* ((user-introduced-vars (symbolset-difference sub-closure this-closure)) + (unboxable-vars (symbolset-difference user-introduced-vars escaping-shared-vars))) (for-each (lambda (v) (when (test v 'boxed) ; Not strictly needed, but cleaner this way (db-put! db v 'boxed #f))) -- 2.49.0
+---[1]: |-> installation-prefix: /home/sjamaan/chickens/chicken-6-master |-> csc-options: |-> runtime-options: |-> repetitions: 10 +---[2]: |-> installation-prefix: /home/sjamaan/chickens/chicken-6-shared-closures |-> csc-options: |-> runtime-options: |-> repetitions: 10 === === Overall results === === [cpu-time] [0]: 1.0 (461.07s) [1]: 1.01 (456.16s) === === Results by metric === In the tables below results are normalized (larger numbers indicate better results). === [cpu-time] Programs [1] [2] ======================================== 0__________________________1.0_______1.0 binarytrees_______________1.01_______1.0 boyer______________________1.0______1.01 browse____________________1.01_______1.0 conform____________________1.0______1.06 cpstak_____________________1.0______1.01 ctak_______________________1.0______1.04 dderiv____________________1.02_______1.0 deriv_____________________1.01_______1.0 destructive________________1.0______1.04 dfa_______________________1.08_______1.0 div-iter__________________1.02_______1.0 div-rec____________________1.1_______1.0 dynamic___________________1.16_______1.0 earley____________________1.07_______1.0 fannkuch___________________1.0______1.09 fft________________________1.0______1.12 fib________________________1.0______1.11 fibc_______________________1.0______1.13 fibfp______________________1.0______1.08 fprint_____________________1.0______1.17 fread______________________1.0______1.02 gcbench____________________1.0______1.03 gold_______________________1.0______1.01 gold2______________________1.0_______1.0 graphs_____________________1.0______1.08 hanoi______________________1.0______1.06 integ______________________1.0______1.01 integ2_____________________1.0______1.06 kanren_____________________1.0_______1.0 kernwyk-ackermann__________1.0______1.05 kernwyk-array______________1.0______1.02 kernwyk-cat________________1.0______1.49 kernwyk-string_____________1.0_______1.0 kernwyk-sum_______________1.04_______1.0 kernwyk-tail______________1.02_______1.0 kernwyk-wc_________________1.0______1.04 knucleotide_______________1.09_______1.0 lattice___________________1.01_______1.0 matrix_____________________1.0_______1.0 maze_______________________1.0_______1.0 mazefun____________________1.0_______1.0 mbrot______________________1.0______1.05 nbody______________________1.0______1.01 nboyer____________________1.01_______1.0 nestedloop________________1.16_______1.0 nfa_______________________1.05_______1.0 nqueens___________________1.04_______1.0 ntakl_____________________1.07_______1.0 nucleic2___________________1.0______1.02 paraffins__________________1.0______1.02 parsing___________________1.01_______1.0 pnpoly_____________________1.0______1.06 primes_____________________1.0_______1.0 psyntax___________________1.02_______1.0 puzzle____________________1.03_______1.0 ray_______________________1.02_______1.0 ray2_______________________1.0______1.02 rho3rec____________________1.0______1.02 sboyer____________________1.02_______1.0 scheme____________________1.02_______1.0 sieves-eratosthenes_______1.05_______1.0 simplex___________________1.04_______1.0 slatex____________________1.03_______1.0 sort1______________________1.0_______1.0 tak________________________1.0______1.03 takl______________________1.02_______1.0 takr_______________________1.0______1.05 traverse___________________1.0______1.03 travinit__________________1.02_______1.0 triangl___________________1.01_______1.0
