I'm playing around with a basic map/reduce pattern with the following
code:

(ns read-lines.core
  (:gen-class)
  (:use [clojure.java.io :only (reader)]))

(defn -main [& args]
  (with-open [rdr (reader "/tmp/mydata.txt")]
    (let [file-handle (line-seq rdr)]
      (println (reduce (fn [m x] (inc m)) (pmap (fn [_] 1) file-
handle))))))

Running that to completion on 2.7G of data took 28 minutes. Looking at
my system resources, I saw a pretty even balance between disk
utilization and CPU utilization. I'm wondering if there's anything
obvious I'm missing that could speed this up? I'm going to try again
without pmap to see if context switching hurt performance since I only
have two cores. On the upside, the memory usage was constant, so
that's great for large datasets.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to