This is related to my multi-level bucketing problem which I am starting new thread.
The code is at: https://gist.github.com/952861 I am referring to the sum-by function which was provided by fellow clojurer in this group. It choked when I passed in data of size one million - meaning I didn't run out of memory but it took a very long time. Quoting below are two functions from my code: (def data (take 1000000 (repeatedly get-rec))) ;get aggregate values for list of attributes (defn sum-by [data attrs] (let [aggregated (group-by (apply juxt attrs) data)] (zipmap (keys aggregated) (map #(reduce + (map :mv %)) (vals aggregated))))) ;invoke sum-by (sum-by data [:attr1 :attr2]) Are there any obvious performance optimizations (e.g. transient, etc) that can be performed so that the function can perform better and consume less memory? In general what are the things to watch out for when writing functions such as these so as not to get poor performance with very large data sets. Thanks for your help. -- Shoeb -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en