I did a bit more torture testing of clojure 1.2 and those JVM settings
this time for speed.

user=> (time (domany 100000 (repeatedly gensym)))
"Elapsed time: 1362.33344 msecs"
{:foo
 clojure.lang.Symbol}
user=> (time (pmap #(assoc % :bar 1) (take 10 (repeatedly #(domany
10000 (repeatedly gensym))))))
"Elapsed time: 210.43172 msecs"
({:bar 1,
  :foo
  clojure.lang.Symbol}
 {:bar 1,
... yadda yadda yadda

(This is after a couple of prior runs to JIT everything.)

This is interesting. The machine's only dual core; the roughly 6x
speedup therefore tells me that repeated gensymming is not CPU-bound.
On the other hand I would expect no speedup at all if gensymming
wasn't CPU bound because it spent lots of time waiting on a lock on
some counter used to generate the next gensym's numerical part while
avoiding collisions. Ten threads wouldn't be able to generate gensyms
any faster than one thread, in that case, unless there was a long wait
for something other than a global lock in gensym.

I tried testing this because I suspected that runtime use of gensym,
besides leaking permgen, might create a bottleneck at a global lock if
done in a concurrent app; seems that's not the case at least up to a
parallelism factor of 10, for whatever reason.

Changing (take 10 (repeatedly #(...))) to (repeat 10 (...)) results in
a 2x further speedup as well as the gensym operation being done only
10,000 times instead of 100,000. So 90,000 gensyms done in parallel
takes ~100ms, 10,000 would take ~110, and the other ~100ms is consumed
by pmap overhead. Compared with

user=> (time (domany 90000 (repeatedly gensym)))
"Elapsed time: 1164.5704 msecs"

that's a nearly 12x speedup from parallelism after adjusting for
pmap's overhead!

The amount of CPU spent on gensym creation per ms can have doubled at
most, so again most of the time is spent waiting on something and
since this is certainly not I/O bound -- no printing occurs until
after the part that's timed has been timed -- it's got to be locks and
synchronization of some sort (unless there's a gratuitous Thread/sleep
buried in clojure somewhere, which seems highly unlikely), but it
can't be a global lock in gensym creation or parallelization wouldn't
produce any speedup to speak of. (If there's a global lock, most of
the time must be spent elsewhere than waiting on that particular lock,
or all 10 threads would spend most of their time queued up on that one
lock and wouldn't get things done any faster than one thread would.)

user=> (time (pmap #(assoc % :bar 1) (take 5 (repeatedly #(domany
20000 (repeatedly gensym))))))
"Elapsed time: 555.41208 msecs"

The speedup is only 2x instead of 6x with half as many parallel
threads. Subtracting the estimated 100ms pmap overhead gives 455 which
makes the speedup closer to 3x (vs. 12x with 10 threads). Quintupling
the thread count from 1 yields a 3x speedup but a further doubling
yields a 4x speedup? That seems strange.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to