Hi all,

Another week, another patch :)

The first patch gets rid of the calls to SRFI-1's EVERY and MAP
in favor of a hand-rolled loop.  This is slightly more verbose,
but by doing it this way we can avoid some recursive consing which
MAP performs (if nodes are equal, no need to reverse the list)
and because EVERY is so fully generic it does a lot of stuff
before actually getting started on the list, while the lists
in WALK-GENERIC are very often only one or two items.

The second patch is a bit more delicate and deserves some more
explanation.  When a program initially gets translated into a
node tree, it is normalised to have all bodies transformed into
let-statements, as well as the toplevel.

For example, the test program "(print 1) (print 2) (print 3)"
gets normalised to this (use csc -debug T to see this tree):

[initial node tree]
(lambda ()
  (let ((t1 (##core#callunit "library")))
    (let ((t2 (##core#callunit "eval")))
      (let ((t3 (print 1)))
        (let ((t4 (print 2)))
          (let ((t5 (print 3)))
            (let ((t6 ((##sys#implicit-exit-handler))))
              (##core#undefined))))))))

Then, during CPS conversion, each statement gets converted to
a lambda which explicitly accepts the value from the previous
continuation.  While this is done, all LET statements are
converted so that the same variable still refers properly to
the value, but now it needs to refer to the lambda's argument
(use csc -debug 3 to see this tree):

[cps]
(lambda (k8)
  (let ((k9 (##core#lambda
              (r10)
              (let ((t1 r10))
                (let ((k12 (##core#lambda
                             (r13)
                             (let ((t2 r13))
                               (let ((k15 (##core#lambda
                                            (r16)
                                            (let ((t3 r16))
                                              (let ((k18 (##core#lambda
                                                           (r19)
                                                           (let ((t4 r19))
                                                             (let ((k21 
(##core#lambda
                                                                          (r22)
                                                                          (let 
((t5 r22))
                                                                            
(let ((k24 (##core#lambda
                                                                                
         (r25)
                                                                                
         (let ((t6 r25)) (k8 (##core#undefined))))))
                                                                              
(let ((k27 (##core#lambda (r28) (r28 k24))))
                                                                                
(##sys#implicit-exit-handler k27)))))))
                                                               (print k21 
3))))))
                                                (print k18 2))))))
                                 (print k15 1))))))
                  (##core#callunit "eval" k12))))))
    (##core#callunit "library" k9)))


As you can see, there are lots of unneccessary LETs in here:
(t1 r10), (t2 r13), (t3 r16), (t4 r19), (t5 r22) and (t6 r25).
Instead, we could change the process so that the lambda's argument
isn't a normal gensym but takes its name from the LET if we know
the translation is from a LET.  Then we can drop the LET:

[cps]
(lambda (k8)
  (let ((k10 (##core#lambda
               (t1)
               (let ((k12 (##core#lambda
                            (t2)
                            (let ((k14 (##core#lambda
                                         (t3)
                                         (let ((k16 (##core#lambda
                                                      (t4)
                                                      (let ((k18 (##core#lambda
                                                                   (t5)
                                                                   (let ((k20 
(##core#lambda (t6) (k8 (##core#undefined)))))
                                                                     (let ((k22 
(##core#lambda (r23) (r23 k20))))
                                                                       
(##sys#implicit-exit-handler k22))))))
                                                        (print k18 3)))))
                                           (print k16 2)))))
                              (print k14 1)))))
                 (##core#callunit "eval" k12)))))
    (##core#callunit "library" k10)))

As you can see, there are no unneccessary LET statements anymore.
The code even almost fits the screen ;)

Less variables is a good thing because each variable adds extra
overhead, since it gets looked at by the analyzer (which means this
change can almost halve the number of variables looked at by the
analyzer), which then stores them with its attributes in a hash table
(which requires use to hash the symbol, several times for lookup too).
The optimizer then needs to eliminate these variables again when it
decides they are really just aliases for the lambda arguments (which
it does, on the first iteration).

Both patches combined reduce the compilation time of the numbers
test from about 70 seconds to 53 seconds, so they're quite worth it IMO.
The walk-generic change is really needed by the LET change, because
for some reason this causes it to hit that procedure more than
without the change.

Oddly, this seems to cause a shift in the performance profile.
There's now a major bottleneck in the second optimization step
but all the other steps are a lot faster now.  I hope this will
make it easier to further analyze this and eventually get rid of
that (final?) bottleneck, too.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth
>From 997b45df39535d5f60abdc161e28ee4e01c352e1 Mon Sep 17 00:00:00 2001
From: Peter Bex <peter....@xs4all.nl>
Date: Sun, 19 Feb 2012 21:42:34 +0100
Subject: [PATCH 1/2] Use a hand-rolled loop in WALK-GENERIC; this saves us
 having to traverse the list a second time in the slow
 EVERY function and a recursive MAP that checks its
 arguments all the time.  It's called a lot of times
 with small lists, so this adds up

---
 optimizer.scm |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/optimizer.scm b/optimizer.scm
index 30e4a2d..b470198 100644
--- a/optimizer.scm
+++ b/optimizer.scm
@@ -500,11 +500,17 @@
          (else (walk-generic n class params subs fids gae #f)) ) ) )
     
     (define (walk-generic n class params subs fids gae invgae)
-      (let ((subs2 (map (cut walk <> fids gae) subs)))
-       (when invgae (invalidate-gae! gae))
-       (if (every eq? subs subs2)
-           n
-           (make-node class params subs2) ) ) )
+      (let lp ((same? #t)
+               (subs subs)
+               (subs2 '()))
+        (cond ((null? subs)
+               (when invgae (invalidate-gae! gae))
+               ;; Create new node if walk made changes, otherwise original node
+               (if same? n (make-node class params (reverse subs2))))
+              (else
+               (let ((sub2 (walk (car subs) fids gae)))
+                 (lp (and same? (eq? sub2 (car subs)))
+                     (cdr subs) (cons sub2 subs2)))) ) ))
 
     (if (perform-pre-optimization! node db)
        (values node #t)
-- 
1.7.9.1

>From 9351d096988b5b0fe549d56f77cdf96741655635 Mon Sep 17 00:00:00 2001
From: Peter Bex <peter....@xs4all.nl>
Date: Sun, 19 Feb 2012 22:39:46 +0100
Subject: [PATCH 2/2] Don't generate extra LET statements during cps
 transformation but try to re-use old LET variables as
 lambda arguments

---
 compiler.scm |   77 ++++++++++++++++++++++++++++++++++------------------------
 1 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/compiler.scm b/compiler.scm
index 3df1865..4bfecec 100644
--- a/compiler.scm
+++ b/compiler.scm
@@ -1655,46 +1655,55 @@
 
 (define (perform-cps-conversion node)
 
-  (define (cps-lambda id llist subs k)
-    (let ([t1 (gensym 'k)])
+  (define (cps-lambda id returnvar llist subs k)
+    (let ([t1 (or returnvar (gensym 'k))])
       (k (make-node
          '##core#lambda (list id #t (cons t1 llist) 0)
-         (list (walk (car subs)
+         (list (walk (gensym-f-id)
+                      (car subs)
                      (lambda (r) 
                        (make-node '##core#call (list #t) (list (varnode t1) 
r)) ) ) ) ) ) ) )
+
+  (define (node-for-var? node var)
+     (and (eq? (node-class node) '##core#variable)
+          (eq? (car (node-parameters node)) var)))
   
-  (define (walk n k)
+  (define (walk returnvar n k)
     (let ((subs (node-subexpressions n))
          (params (node-parameters n)) 
          (class (node-class n)) )
       (case (node-class n)
        ((##core#variable quote ##core#undefined ##core#primitive) (k n))
        ((if) (let* ((t1 (gensym 'k))
-                    (t2 (gensym 'r))
+                    (t2 (or returnvar (gensym 'r)))
                     (k1 (lambda (r) (make-node '##core#call (list #t) (list 
(varnode t1) r)))) )
                (make-node 
                 'let
                 (list t1)
                 (list (make-node '##core#lambda (list (gensym-f-id) #f (list 
t2) 0) 
                                  (list (k (varnode t2))) )
-                      (walk (car subs)
+                      (walk #f (car subs)
                             (lambda (v)
                               (make-node 'if '()
                                          (list v
-                                               (walk (cadr subs) k1)
-                                               (walk (caddr subs) k1) ) ) ) ) 
) ) ) )
+                                               (walk #f (cadr subs) k1)
+                                               (walk #f (caddr subs) k1) ) ) ) 
) ) ) ) )
        ((let)
         (let loop ((vars params) (vals subs))
           (if (null? vars)
-              (walk (car vals) k)
-              (walk (car vals)
-                    (lambda (r) 
-                      (make-node 'let
-                                 (list (car vars))
-                                 (list r (loop (cdr vars) (cdr vals))) ) ) ) ) 
) )
-       ((lambda ##core#lambda) (cps-lambda (gensym-f-id) (first params) subs 
k))
+              (walk #f (car vals) k)
+              (walk (car vars)
+                     (car vals)
+                    (lambda (r)
+                       (if (node-for-var? r (car vars)) ; Don't generate 
unneccessary lets
+                           (loop (cdr vars) (cdr vals))
+                           (make-node 'let
+                                      (list (car vars))
+                                      (list r (loop (cdr vars) (cdr vals))) )) 
) ) ) ) )
+       ((lambda ##core#lambda) (cps-lambda (gensym-f-id) returnvar (first 
params) subs k))
        ((set!) (let ((t1 (gensym 't)))
-                 (walk (car subs)
+                 (walk #f
+                        (car subs)
                        (lambda (r)
                          (make-node 'let (list t1)
                                     (list (make-node 'set! (list (first 
params)) (list r))
@@ -1706,23 +1715,24 @@
             (cons (apply make-foreign-callback-stub id params) 
foreign-callback-stubs) )
           ;; mark to avoid leaf-routine optimization
           (mark-variable id '##compiler#callback-lambda)
-          (cps-lambda id (first (node-parameters lam)) (node-subexpressions 
lam) k) ) )
+           ;; maybe pass returnvar here?
+          (cps-lambda id #f (first (node-parameters lam)) (node-subexpressions 
lam) k) ) )
        ((##core#inline ##core#inline_allocate ##core#inline_ref 
##core#inline_update ##core#inline_loc_ref 
                        ##core#inline_loc_update)
         (walk-inline-call class params subs k) )
-       ((##core#call) (walk-call (car subs) (cdr subs) params k))
-       ((##core#callunit) (walk-call-unit (first params) k))
+       ((##core#call) (walk-call returnvar (car subs) (cdr subs) params k))
+       ((##core#callunit) (walk-call-unit returnvar (first params) k))
        ((##core#the)
         ;; remove "the" nodes, as they are not used after scrutiny
-        (walk (car subs) k))
+        (walk returnvar (car subs) k))
        ((##core#typecase)
         ;; same here, the last clause is chosen, exp is dropped
-        (walk (last subs) k))
+        (walk returnvar (last subs) k))
        (else (bomb "bad node (cps)")) ) ) )
   
-  (define (walk-call fn args params k)
+  (define (walk-call returnvar fn args params k)
     (let ((t0 (gensym 'k))
-          (t3 (gensym 'r)) )
+          (t3 (or returnvar (gensym 'r))) )
       (make-node
        'let (list t0)
        (list (make-node '##core#lambda (list (gensym-f-id) #f (list t3) 0) 
@@ -1730,13 +1740,13 @@
             (walk-arguments
              args
              (lambda (vars)
-               (walk fn
+               (walk #f fn
                      (lambda (r) 
                        (make-node '##core#call params (cons* r (varnode t0) 
vars) ) ) ) ) ) ) ) ) )
   
-  (define (walk-call-unit unitname k)
+  (define (walk-call-unit returnvar unitname k)
     (let ((t0 (gensym 'k))
-         (t3 (gensym 'r)) )
+         (t3 (or returnvar (gensym 'r))) )
       (make-node
        'let (list t0)
        (list (make-node '##core#lambda (list (gensym-f-id) #f (list t3) 0) 
@@ -1757,12 +1767,15 @@
              (loop (cdr args) (cons (car args) vars)) )
             (else
              (let ((t1 (gensym 'a)))
-               (walk (car args)
+               (walk t1
+                     (car args)
                      (lambda (r)
-                      (make-node 'let (list t1)
-                                 (list r
-                                       (loop (cdr args) 
-                                             (cons (varnode t1) vars) ) ) ) ) 
) ) ) ) ) )
+                       (if (node-for-var? r t1) ; Don't generate unneccessary 
lets
+                           (loop (cdr args) (cons (varnode t1) vars) )
+                           (make-node 'let (list t1)
+                                      (list r
+                                            (loop (cdr args) 
+                                                  (cons (varnode t1) vars) ) ) 
)) ) ) ) ) ) ) )
   
   (define (atomic? n)
     (let ((class (node-class n)))
@@ -1772,7 +1785,7 @@
                             ##core#inline_loc_ref ##core#inline_loc_update))
               (every atomic? (node-subexpressions n)) ) ) ) )
   
-  (walk node values) )
+  (walk #f node values) )
 
 
 ;;; Foreign callback stub type:
-- 
1.7.9.1

_______________________________________________
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers

Reply via email to