When optimizing my Chicken code for speed, I find one of the most important things is to write the code so that it gets compiled into a single C function with gotos. It's often so important to get this that I end up writing very unnatural Scheme code, including manually combining several loops into one.
For a simple example, if you look at the canonical mutual recursion example: (letrec ((myeven? (lambda (x) (if (zero? x) #t (myodd? (- x 1))))) (myodd? (lambda (x) (if (zero? x) #f (myeven? (- x 1)))))) (myeven? n)) this will generate two mutually recursive C functions in the backend. Because the overhead of the body of these functions is negligible, the cost of the function overhead is relatively high. One can consider, instead, combining the two procedures into a single, self-recursive procedure. We do this by prepending a label to the argument list, with an integer value assigned to each of the virtual procedures, and dispatch accordingly. The above case becomes something like: (let dispatch ((label 0) (x n)) (if (eq? label 0) (if (zero? x) #t (dispatch 1 (- x 1))) (if (zero? x) #f (dispatch 0 (- x 1))))) Here the label 0 is for myeven?, and 1 is for myodd?. At first glance, apart from being much harder to read, this would seem to be a pessimisation - we're doing the extra work of a conditional test on each of the loop steps. However, in the C backend this gets compiled to a single procedure with self-tail-calls contracted to gotos. Whenever you find yourself writing ugly code in the name of speed, you should stop and try to abstract away the ugliness. To that end, the let-machine syntax at the end of this article will automate the process for you. Given any letrec expression of literal lambdas, you can simply replace letrec with let-machine. In this case: (let-machine ((myeven? (lambda (x) (if (zero? x) #t (myodd? (- x 1))))) (myodd? (lambda (x) (if (zero? x) #f (myeven? (- x 1)))))) (myeven? n)) will generate code similar to the manually unified loop above. Does it help? For this example with n=400000000, I get 8.703 seconds for the letrec version compared to 1.671 seconds for the let-machine version - a better than 5x improvement! However, this is a toy example where the real work is almost nil and the function overhead everything. There's also an important caveat - the goto contraction can only happen in nested chains of inlined primitives. If you call an arbitrary procedure, it will always generate a full function call. That's a big caveat (and learning exactly when the goto's get generated takes practice). So are there useful examples of loops involving only primitives? A good example of an expensive nested loop is the matrix multiplication algorithm. To multiple matrix A x B, the resulting matrix C can be determined by C(i,j) = A[i,:] x B[:,j] i.e. the dot product of the ith row of A with the jth column of B. In Scheme (with matrices as vectors and passing explicit row/column information): (define (matrix-multiply a a-rows a-cols b b-rows b-cols) (assert (= a-cols b-rows)) (let ((c (make-vector (* a-rows b-cols)))) (let lp1 ((i 0)) (if (= i a-rows) c (let lp2 ((j 0)) (if (= j b-cols) (lp1 (+ i 1)) (let lp3 ((k 0) (sum 0)) (if (= k a-cols) (begin (vector-set! c (+ (* i a-rows) j) sum) (lp2 (+ j 1))) (lp3 (+ k 1) (+ sum (* (vector-ref a (+ (* i a-cols) k)) (vector-ref b (+ (* k b-cols) j))))) )))))))) [This would be a little easier to read with do, and a lot easier to read with foof-loop, but I want to make the recursive calls explicit.] The innermost loop lp3 is computing the dot product. It's just performing simple arithmetic operations on each step but the last, so it will be compiled normally with a goto so long as usual integrations are enabled. However, the outer loops are also just composed of primitives, plus calls to the (known) inner loops. What would happen if we could flatten them? It turns out, we _can_ flatten them. By writing them the way we did, with explicit calls back to the outer loop at the end of each loop, we've showed how they are mutually recursive. Thus they allow a straightforward conversion to letrec, and then let-machine: (define (fast-matrix-multiply a a-rows a-cols b b-rows b-cols) (assert (= a-cols b-rows)) (let ((c (make-vector (* a-rows b-cols)))) (let-machine ((lp1 (lambda (i) (if (= i a-rows) c (lp2 0)))) (lp2 (lambda (j) (if (= j b-cols) (lp1 (+ i 1)) (lp3 0 0)))) (lp3 (lambda (k sum) (if (= k a-cols) (begin (vector-set! c (+ (* i a-rows) j) sum) (lp2 (+ j 1))) (lp3 (+ k 1) (+ sum (* (vector-ref a (+ (* i a-cols) k)) (vector-ref b (+ (* k b-cols) j))))) )))) (lp1 0)))) So will this really be faster? We've actually put more work into the innermost loop (the most important loop), by requiring the dispatch check on the label, but the whole thing now compiles to a single C function, recursing only with goto. The result of multiplying two 300x300 matrices on my machine is 1.057 seconds for the nested version, and 0.714 seconds for the let-machine version, more than a 30% speed reduction for a real-world procedure where speed is crucial. If you look at the timings, all of the difference is accounted for by minor GC - Cheney on the MTA pushes unknown procedure calls as stack frames to be collected later, whereas goto doesn't push a frame. Consequently, the speed improvement is less noticeable in other Scheme implementations, and is sometimes slightly slower. Try it out if you like - your mileage may vary. -- Alex ;; let-machine implementation - use riaxpander or syntactic-closures (define-syntax let-machine (er-macro-transformer (lambda (expr rename compare) (let* ((procs (map (lambda (x) (cond ((not (symbol? (car x))) (error "not a symbol" (car x))) ((not (and (= 3 (length (cadr x))) (eq? 'lambda (caadr x)) (list? (cadadr x)))) (error "let-machine variables must all be lambdas")) (else (cons (car x) (cdadr x))))) (cadr expr))) (all-vars (apply lset-union eq? (map cadr procs))) (free-vars (cons 'label (append (map car procs) all-vars))) (init (caddr expr))) `(,(rename 'letrec) ((dispatch (lambda (label ,@all-vars) (letrec-syntax ,(map (lambda (p i) `(,(car p) (syntax-rules () ((_ ,@(cadr p)) (dispatch ,i ,@all-vars))))) procs (iota (length procs))) (,(rename 'cond) ,@(map (lambda (p i) `((,(rename 'eq?) label ,i) ,(caddr p))) (drop-right procs 1) (iota (- (length procs) 1))) (else ,(caddr (last procs)))))))) (letrec-syntax ,(map (lambda (p i) `(,(car p) (syntax-rules () ((_ ,@(cadr p)) (dispatch ,i ,@(map (lambda (v) (if (memq v (cadr p)) v #f)) all-vars)))))) procs (iota (length procs))) ,init)))))) _______________________________________________ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users