Hi all, Another week, another patch :)
The first patch gets rid of the calls to SRFI-1's EVERY and MAP in favor of a hand-rolled loop. This is slightly more verbose, but by doing it this way we can avoid some recursive consing which MAP performs (if nodes are equal, no need to reverse the list) and because EVERY is so fully generic it does a lot of stuff before actually getting started on the list, while the lists in WALK-GENERIC are very often only one or two items. The second patch is a bit more delicate and deserves some more explanation. When a program initially gets translated into a node tree, it is normalised to have all bodies transformed into let-statements, as well as the toplevel. For example, the test program "(print 1) (print 2) (print 3)" gets normalised to this (use csc -debug T to see this tree): [initial node tree] (lambda () (let ((t1 (##core#callunit "library"))) (let ((t2 (##core#callunit "eval"))) (let ((t3 (print 1))) (let ((t4 (print 2))) (let ((t5 (print 3))) (let ((t6 ((##sys#implicit-exit-handler)))) (##core#undefined)))))))) Then, during CPS conversion, each statement gets converted to a lambda which explicitly accepts the value from the previous continuation. While this is done, all LET statements are converted so that the same variable still refers properly to the value, but now it needs to refer to the lambda's argument (use csc -debug 3 to see this tree): [cps] (lambda (k8) (let ((k9 (##core#lambda (r10) (let ((t1 r10)) (let ((k12 (##core#lambda (r13) (let ((t2 r13)) (let ((k15 (##core#lambda (r16) (let ((t3 r16)) (let ((k18 (##core#lambda (r19) (let ((t4 r19)) (let ((k21 (##core#lambda (r22) (let ((t5 r22)) (let ((k24 (##core#lambda (r25) (let ((t6 r25)) (k8 (##core#undefined)))))) (let ((k27 (##core#lambda (r28) (r28 k24)))) (##sys#implicit-exit-handler k27))))))) (print k21 3)))))) (print k18 2)))))) (print k15 1)))))) (##core#callunit "eval" k12)))))) (##core#callunit "library" k9))) As you can see, there are lots of unneccessary LETs in here: (t1 r10), (t2 r13), (t3 r16), (t4 r19), (t5 r22) and (t6 r25). Instead, we could change the process so that the lambda's argument isn't a normal gensym but takes its name from the LET if we know the translation is from a LET. Then we can drop the LET: [cps] (lambda (k8) (let ((k10 (##core#lambda (t1) (let ((k12 (##core#lambda (t2) (let ((k14 (##core#lambda (t3) (let ((k16 (##core#lambda (t4) (let ((k18 (##core#lambda (t5) (let ((k20 (##core#lambda (t6) (k8 (##core#undefined))))) (let ((k22 (##core#lambda (r23) (r23 k20)))) (##sys#implicit-exit-handler k22)))))) (print k18 3))))) (print k16 2))))) (print k14 1))))) (##core#callunit "eval" k12))))) (##core#callunit "library" k10))) As you can see, there are no unneccessary LET statements anymore. The code even almost fits the screen ;) Less variables is a good thing because each variable adds extra overhead, since it gets looked at by the analyzer (which means this change can almost halve the number of variables looked at by the analyzer), which then stores them with its attributes in a hash table (which requires use to hash the symbol, several times for lookup too). The optimizer then needs to eliminate these variables again when it decides they are really just aliases for the lambda arguments (which it does, on the first iteration). Both patches combined reduce the compilation time of the numbers test from about 70 seconds to 53 seconds, so they're quite worth it IMO. The walk-generic change is really needed by the LET change, because for some reason this causes it to hit that procedure more than without the change. Oddly, this seems to cause a shift in the performance profile. There's now a major bottleneck in the second optimization step but all the other steps are a lot faster now. I hope this will make it easier to further analyze this and eventually get rid of that (final?) bottleneck, too. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
>From 997b45df39535d5f60abdc161e28ee4e01c352e1 Mon Sep 17 00:00:00 2001 From: Peter Bex <peter....@xs4all.nl> Date: Sun, 19 Feb 2012 21:42:34 +0100 Subject: [PATCH 1/2] Use a hand-rolled loop in WALK-GENERIC; this saves us having to traverse the list a second time in the slow EVERY function and a recursive MAP that checks its arguments all the time. It's called a lot of times with small lists, so this adds up --- optimizer.scm | 16 +++++++++++----- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/optimizer.scm b/optimizer.scm index 30e4a2d..b470198 100644 --- a/optimizer.scm +++ b/optimizer.scm @@ -500,11 +500,17 @@ (else (walk-generic n class params subs fids gae #f)) ) ) ) (define (walk-generic n class params subs fids gae invgae) - (let ((subs2 (map (cut walk <> fids gae) subs))) - (when invgae (invalidate-gae! gae)) - (if (every eq? subs subs2) - n - (make-node class params subs2) ) ) ) + (let lp ((same? #t) + (subs subs) + (subs2 '())) + (cond ((null? subs) + (when invgae (invalidate-gae! gae)) + ;; Create new node if walk made changes, otherwise original node + (if same? n (make-node class params (reverse subs2)))) + (else + (let ((sub2 (walk (car subs) fids gae))) + (lp (and same? (eq? sub2 (car subs))) + (cdr subs) (cons sub2 subs2)))) ) )) (if (perform-pre-optimization! node db) (values node #t) -- 1.7.9.1
>From 9351d096988b5b0fe549d56f77cdf96741655635 Mon Sep 17 00:00:00 2001 From: Peter Bex <peter....@xs4all.nl> Date: Sun, 19 Feb 2012 22:39:46 +0100 Subject: [PATCH 2/2] Don't generate extra LET statements during cps transformation but try to re-use old LET variables as lambda arguments --- compiler.scm | 77 ++++++++++++++++++++++++++++++++++------------------------ 1 files changed, 45 insertions(+), 32 deletions(-) diff --git a/compiler.scm b/compiler.scm index 3df1865..4bfecec 100644 --- a/compiler.scm +++ b/compiler.scm @@ -1655,46 +1655,55 @@ (define (perform-cps-conversion node) - (define (cps-lambda id llist subs k) - (let ([t1 (gensym 'k)]) + (define (cps-lambda id returnvar llist subs k) + (let ([t1 (or returnvar (gensym 'k))]) (k (make-node '##core#lambda (list id #t (cons t1 llist) 0) - (list (walk (car subs) + (list (walk (gensym-f-id) + (car subs) (lambda (r) (make-node '##core#call (list #t) (list (varnode t1) r)) ) ) ) ) ) ) ) + + (define (node-for-var? node var) + (and (eq? (node-class node) '##core#variable) + (eq? (car (node-parameters node)) var))) - (define (walk n k) + (define (walk returnvar n k) (let ((subs (node-subexpressions n)) (params (node-parameters n)) (class (node-class n)) ) (case (node-class n) ((##core#variable quote ##core#undefined ##core#primitive) (k n)) ((if) (let* ((t1 (gensym 'k)) - (t2 (gensym 'r)) + (t2 (or returnvar (gensym 'r))) (k1 (lambda (r) (make-node '##core#call (list #t) (list (varnode t1) r)))) ) (make-node 'let (list t1) (list (make-node '##core#lambda (list (gensym-f-id) #f (list t2) 0) (list (k (varnode t2))) ) - (walk (car subs) + (walk #f (car subs) (lambda (v) (make-node 'if '() (list v - (walk (cadr subs) k1) - (walk (caddr subs) k1) ) ) ) ) ) ) ) ) + (walk #f (cadr subs) k1) + (walk #f (caddr subs) k1) ) ) ) ) ) ) ) ) ((let) (let loop ((vars params) (vals subs)) (if (null? vars) - (walk (car vals) k) - (walk (car vals) - (lambda (r) - (make-node 'let - (list (car vars)) - (list r (loop (cdr vars) (cdr vals))) ) ) ) ) ) ) - ((lambda ##core#lambda) (cps-lambda (gensym-f-id) (first params) subs k)) + (walk #f (car vals) k) + (walk (car vars) + (car vals) + (lambda (r) + (if (node-for-var? r (car vars)) ; Don't generate unneccessary lets + (loop (cdr vars) (cdr vals)) + (make-node 'let + (list (car vars)) + (list r (loop (cdr vars) (cdr vals))) )) ) ) ) ) ) + ((lambda ##core#lambda) (cps-lambda (gensym-f-id) returnvar (first params) subs k)) ((set!) (let ((t1 (gensym 't))) - (walk (car subs) + (walk #f + (car subs) (lambda (r) (make-node 'let (list t1) (list (make-node 'set! (list (first params)) (list r)) @@ -1706,23 +1715,24 @@ (cons (apply make-foreign-callback-stub id params) foreign-callback-stubs) ) ;; mark to avoid leaf-routine optimization (mark-variable id '##compiler#callback-lambda) - (cps-lambda id (first (node-parameters lam)) (node-subexpressions lam) k) ) ) + ;; maybe pass returnvar here? + (cps-lambda id #f (first (node-parameters lam)) (node-subexpressions lam) k) ) ) ((##core#inline ##core#inline_allocate ##core#inline_ref ##core#inline_update ##core#inline_loc_ref ##core#inline_loc_update) (walk-inline-call class params subs k) ) - ((##core#call) (walk-call (car subs) (cdr subs) params k)) - ((##core#callunit) (walk-call-unit (first params) k)) + ((##core#call) (walk-call returnvar (car subs) (cdr subs) params k)) + ((##core#callunit) (walk-call-unit returnvar (first params) k)) ((##core#the) ;; remove "the" nodes, as they are not used after scrutiny - (walk (car subs) k)) + (walk returnvar (car subs) k)) ((##core#typecase) ;; same here, the last clause is chosen, exp is dropped - (walk (last subs) k)) + (walk returnvar (last subs) k)) (else (bomb "bad node (cps)")) ) ) ) - (define (walk-call fn args params k) + (define (walk-call returnvar fn args params k) (let ((t0 (gensym 'k)) - (t3 (gensym 'r)) ) + (t3 (or returnvar (gensym 'r))) ) (make-node 'let (list t0) (list (make-node '##core#lambda (list (gensym-f-id) #f (list t3) 0) @@ -1730,13 +1740,13 @@ (walk-arguments args (lambda (vars) - (walk fn + (walk #f fn (lambda (r) (make-node '##core#call params (cons* r (varnode t0) vars) ) ) ) ) ) ) ) ) ) - (define (walk-call-unit unitname k) + (define (walk-call-unit returnvar unitname k) (let ((t0 (gensym 'k)) - (t3 (gensym 'r)) ) + (t3 (or returnvar (gensym 'r))) ) (make-node 'let (list t0) (list (make-node '##core#lambda (list (gensym-f-id) #f (list t3) 0) @@ -1757,12 +1767,15 @@ (loop (cdr args) (cons (car args) vars)) ) (else (let ((t1 (gensym 'a))) - (walk (car args) + (walk t1 + (car args) (lambda (r) - (make-node 'let (list t1) - (list r - (loop (cdr args) - (cons (varnode t1) vars) ) ) ) ) ) ) ) ) ) ) + (if (node-for-var? r t1) ; Don't generate unneccessary lets + (loop (cdr args) (cons (varnode t1) vars) ) + (make-node 'let (list t1) + (list r + (loop (cdr args) + (cons (varnode t1) vars) ) ) )) ) ) ) ) ) ) ) (define (atomic? n) (let ((class (node-class n))) @@ -1772,7 +1785,7 @@ ##core#inline_loc_ref ##core#inline_loc_update)) (every atomic? (node-subexpressions n)) ) ) ) ) - (walk node values) ) + (walk #f node values) ) ;;; Foreign callback stub type: -- 1.7.9.1
_______________________________________________ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers