I needed to shuffle a vector so I decided to try doing it functionally.
Below is the implementation and a couple exercise functions, inspired by
http://en.wikipedia.org/wiki/Fisher-Yates_shuffle and
http://szeryf.wordpress.com/2007/06/19/a-simple-shuffle-that-proved-not-so-simple-after-all/
.

A version of sort-by that computes each key only once does most of the work;
shuffle is trivial using that. I haven't tried it on data other than
integers yet. The speed is decent under half a million integers, but the
space cost is high, requiring a larger Java heap to sort a million. As soon
as I got it working I discovered java.util.Collections/shuffle, which blows
my implementation away on speed and space, comfortably sorting 10 million.

I'm open to feedback on the algorithm or improving my Clojure. For one
thing, it seems like there must be a cleaner way to handle my use of reduce
in test-shuffle-uniformity. I like reduce for looping functionally, but in
this case I just need to run n times regardless of the specific values in
(range n). Is that use case common enough for some kind of reduce-times
macro?

(defn sort-by-mem-keys
  "Like clojure/sort-by but only calls keyfn once on each item in coll."
  ([keyfn coll]
     (sort-by-mem-keys keyfn compare coll))
  ([keyfn #^java.util.Comparator comp coll]
     (let [key-vals (map (fn [x] [(keyfn x) x]) coll)
           sorted-pairs (sort (fn [x y] (.compare comp (first x) (first y)))
                              key-vals)]
       (map second sorted-pairs))))

(defn shuffle
  "Returns a seq of coll shuffled in O(n log n) time. Space is at least
O(2N). Each item in coll is
  assigned a random number which becomes the sort key."
  [coll]
  (sort-by-mem-keys (fn [_] (rand)) coll))

(import '(java.util ArrayList Collections))

(defn shuffle-java
  "Shuffles coll using a Java ArrayList. Blows shuffle out of the water on
  speed and space."
  [coll]
  (let [l (ArrayList. coll)]
    (Collections/shuffle l)
    (seq l)))

(defn test-shuffle-uniformity
  "Shuffles a 3-item collection n times, mapping each result to the number
of
  times it's hit. If the distribution isn't uniform we've let bias into the
  random numbers or the algorithm. Typical results on my machine:
  (2 1 3) : 16611
  (3 2 1) : 16771
  (1 3 2) : 16707
  (1 2 3) : 16766
  (3 1 2) : 16555
  (2 3 1) : 16590"
  ([n] (test-shuffle-uniformity n shuffle))
  ([n shuffle]
     (let [coll [1 2 3]
           results (reduce (fn [results _]
                             (let [result (shuffle coll)]
                               (assoc results result
                                      (inc (or (get results result)
                                               0)))))
                           {} (range n))]
       (doseq r results
         (println (key r) ":" (val r))))))

(defn test-shuffle-time
  "Runs shuffle on collections of varying sizes, printing out the times. I
had
  to run java with -Xmx256m to get to a million without running out of
  memory."
  []
  (doseq n [1000 10000 100000 1000000]
    (print n ": ")
    (time (dorun (shuffle (range n))))))

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to