I needed to shuffle a vector so I decided to try doing it functionally. Below is the implementation and a couple exercise functions, inspired by http://en.wikipedia.org/wiki/Fisher-Yates_shuffle and http://szeryf.wordpress.com/2007/06/19/a-simple-shuffle-that-proved-not-so-simple-after-all/ .
A version of sort-by that computes each key only once does most of the work; shuffle is trivial using that. I haven't tried it on data other than integers yet. The speed is decent under half a million integers, but the space cost is high, requiring a larger Java heap to sort a million. As soon as I got it working I discovered java.util.Collections/shuffle, which blows my implementation away on speed and space, comfortably sorting 10 million. I'm open to feedback on the algorithm or improving my Clojure. For one thing, it seems like there must be a cleaner way to handle my use of reduce in test-shuffle-uniformity. I like reduce for looping functionally, but in this case I just need to run n times regardless of the specific values in (range n). Is that use case common enough for some kind of reduce-times macro? (defn sort-by-mem-keys "Like clojure/sort-by but only calls keyfn once on each item in coll." ([keyfn coll] (sort-by-mem-keys keyfn compare coll)) ([keyfn #^java.util.Comparator comp coll] (let [key-vals (map (fn [x] [(keyfn x) x]) coll) sorted-pairs (sort (fn [x y] (.compare comp (first x) (first y))) key-vals)] (map second sorted-pairs)))) (defn shuffle "Returns a seq of coll shuffled in O(n log n) time. Space is at least O(2N). Each item in coll is assigned a random number which becomes the sort key." [coll] (sort-by-mem-keys (fn [_] (rand)) coll)) (import '(java.util ArrayList Collections)) (defn shuffle-java "Shuffles coll using a Java ArrayList. Blows shuffle out of the water on speed and space." [coll] (let [l (ArrayList. coll)] (Collections/shuffle l) (seq l))) (defn test-shuffle-uniformity "Shuffles a 3-item collection n times, mapping each result to the number of times it's hit. If the distribution isn't uniform we've let bias into the random numbers or the algorithm. Typical results on my machine: (2 1 3) : 16611 (3 2 1) : 16771 (1 3 2) : 16707 (1 2 3) : 16766 (3 1 2) : 16555 (2 3 1) : 16590" ([n] (test-shuffle-uniformity n shuffle)) ([n shuffle] (let [coll [1 2 3] results (reduce (fn [results _] (let [result (shuffle coll)] (assoc results result (inc (or (get results result) 0))))) {} (range n))] (doseq r results (println (key r) ":" (val r)))))) (defn test-shuffle-time "Runs shuffle on collections of varying sizes, printing out the times. I had to run java with -Xmx256m to get to a million without running out of memory." [] (doseq n [1000 10000 100000 1000000] (print n ": ") (time (dorun (shuffle (range n)))))) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---