Time for some benchmarking.

Using Racket's hash-eq data structure, I implemented basic versions of
write/safe and write/fast, using the reference-counting thing I suggested
earlier--basic in the sense that they only handle pairs and atoms, not
vectors or other compound structures.  I simply used Racket's built-in
"write" to write atoms (which were all small integers).

I construct structures that look like this:
(((1 1)) ((2 2)) ... ((n n)))
and print them out 1,000,000/n times, so that a total of 4,000,000 cons
cells (multiply-counted) get printed out each time.  (I don't count the
time it takes to construct the cons cells.)

write/fast consistently takes 4500-4700 milliseconds to print all of these
structures out, which makes sense, because it does the same amount of work
on each cons cell, no matter how long the tail of the list is.  Over the
range of structures with 40 cons cells to those with 40,000 cons cells,
"write/safe" is generally 10% slower than "write/fast".  At 400,000 cons
cells, "write/safe" takes 33% longer (6100 msec), and at 4 million, 65%
longer (7600 msec).  This seems consistent with my estimate of "O(n) -> O(n
log n), or perhaps O(n), depending on exactly how hash-eqs work".  And
incidentally, at 4 cons cells, write/fast took 5400 and write/safe took
6400 msec (20% slower).

I also tested Racket's built-in "write".  "write" took about 900 msec at
the 40, 400, and 4000-cell structures, 1000 msec at the 40,000-cell
structure, 2600 msec at the 400,000-cell structure, and 4600 msec at the 4
million-cell structure; and 1700 msec at the 4-cell structure.  Note that
Racket's "write" will detect cycles and, if it finds any, will act like
write/shared--so in a sense it already has the performance characteristics
of write/safe:

arc> (let xs $.range.10 (= xs.2 $.range.3 xs.4 xs.2) ($.write xs))
(0 1 (0 1 2) 3 (0 1 2) 5 6 7 8 9)
arc> (let xs $.range.10 (= xs.2 $.range.3 xs.4 xs.2 cdr:last-pair.xs xs)
($.write xs))
#1=(0 1 #0=(0 1 2) 3 #0# 5 6 7 8 9 . #1#)

I will reproduce the code of write/fast and write/safe here, and the code
and my interactions with the Arc REPL (used to drive the underlying Racket
implementation) are all here: http://paste.lisp.org/display/130430

(define-syntax push
  (syntax-rules ()
    ((push x xs)
     (set! xs (cons x xs)))))
(define-syntax pop
  (syntax-rules ()
    ((pop xs) (let ((x (car xs)))
                (set! xs (cdr xs))
                x))))

(define (write/safe x)
  (if (not (pair? x))
      (write/fast x)
      (let ((refcount (make-hasheq)))
        (let ((xs null))
          (hash-set! refcount x 1)
          (push x xs)
          (do ()
            ((null? xs))
            (let ((u (pop xs)))
              (let ((a (car u)) (b (cdr u)))
                (when (pair? a)
                  (if (hash-ref refcount a #f)
                      (hash-set! refcount a (+ 1 (hash-ref refcount a)))
                      (begin (hash-set! refcount a 1)
                             (push a xs))))
                (when (pair? b)
                  (if (hash-ref refcount b #f)
                      (hash-set! refcount b (+ 1 (hash-ref refcount b)))
                      (begin (hash-set! refcount b 1)
                             (push b xs))))))))
        ;now for gc
        (let ((xs (list x))
              (n (hash-count refcount)))
          (do ()
            ((null? xs))
            (let* ((u (pop xs))
                   (h (- (hash-ref refcount u) 1)))
              (hash-set! refcount u h)
              (when (zero? h)
                (when (pair? (car u)) (push (car u) xs))
                (when (pair? (cdr u)) (push (cdr u) xs))
                (set! n (- n 1)))))
          (if (zero? n)
              (write/fast x)
              (error "There must be a cycle somewhere" x))))))

(define (write/fast x)
  (if (pair? x)
      (begin (display #\()
             (write/fast-rest x))
      (write x)))

(define (write/fast-rest x)
  (write/fast (car x))
  (cond ((pair? (cdr x))
         (display #\space)
         (write/fast-rest (cdr x)))
        ((null? (cdr x))
         (display #\)))
        (else (display " . ")
              (write (cdr x))
              (display #\)))))

My conclusion is that it doesn't make much of a difference unless you lack
a good hash-eq implementation.

--

John Cowan writes:

> Lastly, write/small-fast is perfectly safe if you know you don't construct
> cycles, and lots of programs do know that (if they don't use set-car! or
> set-cdr!, for example).

Unsafe fixnum operations are perfectly safe if you know you're not using
any integers that could be outside of your implementation's fixnum limit,
and many programs do know that at least in some places (if they iterate
over an integer that starts at zero and is incremented until it reaches an
integer that is the length of a data structure small enough to be stored in
memory, for example).  It would appear that the cost of the checks done by
write/safe is comparable to the cost of checking for non-fixnums--or
non-numbers, for that matter--or array bounds-checking.

Hmm... but how are things like bounds-checking handled?  On the
documentation of (vector-ref vector k), r7rs-draft-6 says: "It is an error
if k is not a valid index of vector."  I believe that language is
permissive of, for example, segfaulting and crashing on an invalid index.
The same philosophy applied to "write/safe" would seem to indicate writing
"It is an error to pass a circular structure to write/safe", which means a
conforming implementation could just loop infinitely, and the name
"write/safe" would be misleading--should be "write/simple" or something.
Well, at least that is different from not saying anything at all about the
circular case.

Also, the cases are not fully analogous, because there is also write/shared
and the default behavior of "write" to think about.  Hmm... But it does
seem that the Scheme standard does not and should not talk about unsafe
procedures, or about safe versions of procedures.  write/fast should not
exist; there might be write/shared, and there might be write/safe (named
something like write/simple, perhaps, to which "it is an error" to pass
cycles), and there might be Racket's "write" that acts like write/safe but
degrades to write/shared on cycles.  Any of these might be declared the
default "write" (you'd vote on that, I'm not sure which I'd prefer), and
the user should be able to use any of them (with parameters or with
differently-named procedures).
--John Boyle
*Science is what we understand well enough to explain to a computer. Art is
everything else we do.* --Knuth



On Sat, Jul 7, 2012 at 11:50 AM, Aaron W. Hsu <[email protected]> wrote:

> John Cowan <[email protected]> wrote:
>
> > Aaron W. Hsu scripsit:
> >
> > >     write/safe
> > >     write/shared
> > >     write/dangerous-stupid-do-not-use-unless-you-think-you-are-neo
> >
> > Less tendentiously, write/small-fast.
>
> Have we done any benchmarks to see how fast this really is in practice?
> It seems like we are assuming it is faster, but I do not remember seeing
> anything indicating that it was really significantly faster than
> WRITE/SAFE. IMO, the lack of safety of WRITE/SMALL-FAST is hard to justify.
>
> > I assume that the first one writes datum labels for cycles but not
> > for shared structure?  That would seem to violate the whole notion of
> > read-write equivalence: you write out a value with shared structure as
> > a datum, but you read it back as a value without shared structure.
>
> It is WRITE/SMALL-FAST when it is safe to do so, but otherwise it is
> WRITE/SHARED. Of course if violates read/write equivalence, but so does
> normal WRITE with regards to shared structure. As has been pointed out,
> read-write equivalence is useful, but it is not the only valid or useful
> way of thinking about these things. In particular, backwards compatibility
> and pretty printing of code that can be copied back in to a REPL and so
> forth. Most of the time you do not need or want to have read-write
> equivalence because other things are of more value, such as easier reading,
> for humans.
>
> I prefer that we get rid of WRITE/SMALL-FAST and leave only WRITE/SAFE as
> WRITE and WRITE/SHARED as either a parameter or an extra procedure. If it
> matters that much that they want an unsafe WRITE, then let implementations
> provide this separate of the standard, or let them have an implementation
> flag that makes WRITE unsafe for cycles. That seems the better approach to
> me.
>
> --
> Aaron W. Hsu | [email protected] | http://www.sacrideo.us
> Programming is just another word for the lost art of thinking.
>
> _______________________________________________
> Scheme-reports mailing list
> [email protected]
> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
>
_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Reply via email to