This sounded interesting, so I did some more testing. Your partition
procedure is cps-partition1. Then I created two variants: The first
one, cps-partition2, evaluates the car only once per recursion step.
The second one, cps-partition3, moves the predicate evaluation inside
the continuation. simple-partition is just plain tail-recursion with
consing up the result and reversing the result lists at the end. srfi-
partition is a copy from srfi-1 (with extra stuff included inside the
procedure)

Here's the code:

(import (rnrs)
        (only (ikarus) time printf)
        (only (srfi :1) iota))

(define (srfi-partition pred ls)
  (define (null-list? l)
    (cond
      ((pair? l) #f)
      ((null? l) #t)
      (else (error 'null-list? "argument out of domain" l))))
  (unless (procedure? pred)
    (assertion-violation 'srfi-partition "not a procedure" pred))
  (let recur ((ls ls))
    (if (null-list? ls)
        (values ls ls)
        (let ((elt (car ls))
              (tail (cdr ls)))
          (let-values (((in out) (recur tail)))
            (if (pred elt)
                (values (if (pair? out) (cons elt in) ls) out)
                (values in (if (pair? in) (cons elt out) ls))))))))

(define (cps-partition1 pred ls)
  (let loop ((ls ls) (k values))
    (cond ((null? ls)
           (k '() '()))
          ((pred (car ls))
           (loop (cdr ls)
                 (lambda (a b)
                   (k (cons (car ls) a) b))))
          (else
           (loop (cdr ls)
                 (lambda (a b)
                   (k a (cons (car ls) b))))))))

(define (cps-partition2 pred ls)
  (let loop ((ls ls) (k values))
    (if (null? ls)
        (k '() '())
        (let ((x (car ls)))
          (if (pred x)
              (loop (cdr ls)
                    (lambda (a b)
                      (k (cons x a) b)))
              (loop (cdr ls)
                    (lambda (a b)
                      (k a (cons x b)))))))))

(define (cps-partition3 pred ls)
  (let loop ((ls ls) (k values))
    (if (null? ls)
        (k '() '())
        (loop (cdr ls)
              (lambda (a b)
                (let ((x (car ls)))
                  (if (pred x)
                      (k (cons x a) b)
                      (k a (cons x b)))))))))

(define (simple-partition pred ls)
  (let loop ((ls ls) (p1 '()) (p2 '()))
    (if (null? ls)
        (values (reverse p1) (reverse p2))
        (let ((x (car ls)))
          (if (pred x)
              (loop (cdr ls) (cons x p1) p2)
              (loop (cdr ls) p1 (cons x p2)))))))

(define-syntax dotimes
  (syntax-rules ()
    ((_ n e e* ...)
     (do ((i 0 (+ i 1))) ((= i n)) e e* ...))))

(let ((data (iota 1000)))
  (time (dotimes 100000 (srfi-partition odd? data)))
  (time (dotimes 100000 (partition odd? data)))
  (time (dotimes 100000 (cps-partition1 odd? data)))
  (time (dotimes 100000 (cps-partition2 odd? data)))
  (time (dotimes 100000 (cps-partition3 odd? data)))
  (time (dotimes 100000 (simple-partition odd? data))))

(let ((data (iota 1000000)))
  (time (dotimes 10 (srfi-partition odd? data)))
  (time (dotimes 10 (partition odd? data)))
  (time (dotimes 10 (cps-partition1 odd? data)))
  (time (dotimes 10 (cps-partition2 odd? data)))
  (time (dotimes 10 (cps-partition3 odd? data)))
  (time (dotimes 10 (simple-partition odd? data)))
  (let-values (((a b) (srfi-partition odd? data))
               ((c d) (simple-partition odd? data)))
    (printf "\ntest ~a\n" (and (equal? a c) (equal? b d)))))


Here are the results:

First round is 100k runs with an input list of length 1000:

running stats for (dotimes 100000 (srfi-partition odd? data)):
    954 collections
    3636 ms elapsed cpu time, including 180 ms collecting
    3633 ms elapsed real time, including 199 ms collecting
    7998404096 bytes allocated
running stats for (dotimes 100000 (partition odd? data)):
    668 collections
    2645 ms elapsed cpu time, including 136 ms collecting
    2645 ms elapsed real time, including 115 ms collecting
    5600000000 bytes allocated
running stats for (dotimes 100000 (cps-partition1 odd? data)):
    573 collections
    2472 ms elapsed cpu time, including 108 ms collecting
    2487 ms elapsed real time, including 98 ms collecting
    4800000000 bytes allocated
running stats for (dotimes 100000 (cps-partition2 odd? data)):
    573 collections
    2457 ms elapsed cpu time, including 92 ms collecting
    2455 ms elapsed real time, including 97 ms collecting
    4800000000 bytes allocated
running stats for (dotimes 100000 (cps-partition3 odd? data)):
    572 collections
    2909 ms elapsed cpu time, including 112 ms collecting
    2911 ms elapsed real time, including 116 ms collecting
    4800000000 bytes allocated
running stats for (dotimes 100000 (simple-partition odd? data)):
    382 collections
    2352 ms elapsed cpu time, including 52 ms collecting
    2348 ms elapsed real time, including 53 ms collecting
    3200000000 bytes allocated

Second round is 10 runs with an input list of length 1M:

running stats for (dotimes 10 (srfi-partition odd? data)):
    129 collections
    8769 ms elapsed cpu time, including 7629 ms collecting
    8772 ms elapsed real time, including 7640 ms collecting
    1079044480 bytes allocated
running stats for (dotimes 10 (partition odd? data)):
    80 collections
    4585 ms elapsed cpu time, including 3960 ms collecting
    4593 ms elapsed real time, including 3974 ms collecting
    671630720 bytes allocated
running stats for (dotimes 10 (cps-partition1 odd? data)):
    57 collections
    1657 ms elapsed cpu time, including 1358 ms collecting
    1656 ms elapsed real time, including 1348 ms collecting
    480000000 bytes allocated
running stats for (dotimes 10 (cps-partition2 odd? data)):
    58 collections
    1584 ms elapsed cpu time, including 1281 ms collecting
    1585 ms elapsed real time, including 1295 ms collecting
    480000000 bytes allocated
running stats for (dotimes 10 (cps-partition3 odd? data)):
    57 collections
    1781 ms elapsed cpu time, including 1465 ms collecting
    1781 ms elapsed real time, including 1458 ms collecting
    480000000 bytes allocated
running stats for (dotimes 10 (simple-partition odd? data)):
    38 collections
    865 ms elapsed cpu time, including 600 ms collecting
    867 ms elapsed real time, including 609 ms collecting
    320000000 bytes allocated

test #t

What we can see is, that:

- cps style versions are much faster than the srfi or built-in version
(especially for big inputs)

- the optimization of taking the car of the list only once is probably
also done by Ikarus (cps-partition1 and cps-partition2 take the same
time)

- full cps-style (cps-partition3) is a tiny bit slower (why?)

- reversing lists must be incredibly fast in Ikarus, because simple-
partition beats them all (are lists double-linked internally?)

- simple-partition does the least memory allocation

- simple-partition becomes relatively faster with bigger inputs

Additionally, looking at the source code of srfi-partition reveals
that it isn't tail-recursive (yuck!), so large inputs might result in
a memory overflow.

Juergen

Reply via email to