This sounded interesting, so I did some more testing. Your partition
procedure is cps-partition1. Then I created two variants: The first
one, cps-partition2, evaluates the car only once per recursion step.
The second one, cps-partition3, moves the predicate evaluation inside
the continuation. simple-partition is just plain tail-recursion with
consing up the result and reversing the result lists at the end. srfi-
partition is a copy from srfi-1 (with extra stuff included inside the
procedure)
Here's the code:
(import (rnrs)
(only (ikarus) time printf)
(only (srfi :1) iota))
(define (srfi-partition pred ls)
(define (null-list? l)
(cond
((pair? l) #f)
((null? l) #t)
(else (error 'null-list? "argument out of domain" l))))
(unless (procedure? pred)
(assertion-violation 'srfi-partition "not a procedure" pred))
(let recur ((ls ls))
(if (null-list? ls)
(values ls ls)
(let ((elt (car ls))
(tail (cdr ls)))
(let-values (((in out) (recur tail)))
(if (pred elt)
(values (if (pair? out) (cons elt in) ls) out)
(values in (if (pair? in) (cons elt out) ls))))))))
(define (cps-partition1 pred ls)
(let loop ((ls ls) (k values))
(cond ((null? ls)
(k '() '()))
((pred (car ls))
(loop (cdr ls)
(lambda (a b)
(k (cons (car ls) a) b))))
(else
(loop (cdr ls)
(lambda (a b)
(k a (cons (car ls) b))))))))
(define (cps-partition2 pred ls)
(let loop ((ls ls) (k values))
(if (null? ls)
(k '() '())
(let ((x (car ls)))
(if (pred x)
(loop (cdr ls)
(lambda (a b)
(k (cons x a) b)))
(loop (cdr ls)
(lambda (a b)
(k a (cons x b)))))))))
(define (cps-partition3 pred ls)
(let loop ((ls ls) (k values))
(if (null? ls)
(k '() '())
(loop (cdr ls)
(lambda (a b)
(let ((x (car ls)))
(if (pred x)
(k (cons x a) b)
(k a (cons x b)))))))))
(define (simple-partition pred ls)
(let loop ((ls ls) (p1 '()) (p2 '()))
(if (null? ls)
(values (reverse p1) (reverse p2))
(let ((x (car ls)))
(if (pred x)
(loop (cdr ls) (cons x p1) p2)
(loop (cdr ls) p1 (cons x p2)))))))
(define-syntax dotimes
(syntax-rules ()
((_ n e e* ...)
(do ((i 0 (+ i 1))) ((= i n)) e e* ...))))
(let ((data (iota 1000)))
(time (dotimes 100000 (srfi-partition odd? data)))
(time (dotimes 100000 (partition odd? data)))
(time (dotimes 100000 (cps-partition1 odd? data)))
(time (dotimes 100000 (cps-partition2 odd? data)))
(time (dotimes 100000 (cps-partition3 odd? data)))
(time (dotimes 100000 (simple-partition odd? data))))
(let ((data (iota 1000000)))
(time (dotimes 10 (srfi-partition odd? data)))
(time (dotimes 10 (partition odd? data)))
(time (dotimes 10 (cps-partition1 odd? data)))
(time (dotimes 10 (cps-partition2 odd? data)))
(time (dotimes 10 (cps-partition3 odd? data)))
(time (dotimes 10 (simple-partition odd? data)))
(let-values (((a b) (srfi-partition odd? data))
((c d) (simple-partition odd? data)))
(printf "\ntest ~a\n" (and (equal? a c) (equal? b d)))))
Here are the results:
First round is 100k runs with an input list of length 1000:
running stats for (dotimes 100000 (srfi-partition odd? data)):
954 collections
3636 ms elapsed cpu time, including 180 ms collecting
3633 ms elapsed real time, including 199 ms collecting
7998404096 bytes allocated
running stats for (dotimes 100000 (partition odd? data)):
668 collections
2645 ms elapsed cpu time, including 136 ms collecting
2645 ms elapsed real time, including 115 ms collecting
5600000000 bytes allocated
running stats for (dotimes 100000 (cps-partition1 odd? data)):
573 collections
2472 ms elapsed cpu time, including 108 ms collecting
2487 ms elapsed real time, including 98 ms collecting
4800000000 bytes allocated
running stats for (dotimes 100000 (cps-partition2 odd? data)):
573 collections
2457 ms elapsed cpu time, including 92 ms collecting
2455 ms elapsed real time, including 97 ms collecting
4800000000 bytes allocated
running stats for (dotimes 100000 (cps-partition3 odd? data)):
572 collections
2909 ms elapsed cpu time, including 112 ms collecting
2911 ms elapsed real time, including 116 ms collecting
4800000000 bytes allocated
running stats for (dotimes 100000 (simple-partition odd? data)):
382 collections
2352 ms elapsed cpu time, including 52 ms collecting
2348 ms elapsed real time, including 53 ms collecting
3200000000 bytes allocated
Second round is 10 runs with an input list of length 1M:
running stats for (dotimes 10 (srfi-partition odd? data)):
129 collections
8769 ms elapsed cpu time, including 7629 ms collecting
8772 ms elapsed real time, including 7640 ms collecting
1079044480 bytes allocated
running stats for (dotimes 10 (partition odd? data)):
80 collections
4585 ms elapsed cpu time, including 3960 ms collecting
4593 ms elapsed real time, including 3974 ms collecting
671630720 bytes allocated
running stats for (dotimes 10 (cps-partition1 odd? data)):
57 collections
1657 ms elapsed cpu time, including 1358 ms collecting
1656 ms elapsed real time, including 1348 ms collecting
480000000 bytes allocated
running stats for (dotimes 10 (cps-partition2 odd? data)):
58 collections
1584 ms elapsed cpu time, including 1281 ms collecting
1585 ms elapsed real time, including 1295 ms collecting
480000000 bytes allocated
running stats for (dotimes 10 (cps-partition3 odd? data)):
57 collections
1781 ms elapsed cpu time, including 1465 ms collecting
1781 ms elapsed real time, including 1458 ms collecting
480000000 bytes allocated
running stats for (dotimes 10 (simple-partition odd? data)):
38 collections
865 ms elapsed cpu time, including 600 ms collecting
867 ms elapsed real time, including 609 ms collecting
320000000 bytes allocated
test #t
What we can see is, that:
- cps style versions are much faster than the srfi or built-in version
(especially for big inputs)
- the optimization of taking the car of the list only once is probably
also done by Ikarus (cps-partition1 and cps-partition2 take the same
time)
- full cps-style (cps-partition3) is a tiny bit slower (why?)
- reversing lists must be incredibly fast in Ikarus, because simple-
partition beats them all (are lists double-linked internally?)
- simple-partition does the least memory allocation
- simple-partition becomes relatively faster with bigger inputs
Additionally, looking at the source code of srfi-partition reveals
that it isn't tail-recursive (yuck!), so large inputs might result in
a memory overflow.
Juergen