Hi Jarrod I have had success with the clojure-csv [1] library and processing large files in a lazy way (as opposed to using slurp).
[1] - clojure-csv - https://github.com/davidsantiago/clojure-csv Here is a copy of my source code (disclaimer - this is my first Clojure program - so some things might not be idiomatic). This code handles a 250MB file, 315K rows (each row has 100 columns / fields) really well, and can scale in terms of memory usage since it handles the file lazily and processes / parses each line one at a time. See snippets of code below (ns scripts.core (:gen-class)) (require '[clojure.java.io :as io] '[clojure-csv.core :as csv] '[clojure.string :as str]) (def line-count 0) (defn parse-row [row] (first (csv/parse-csv row :delimiter \tab))) (defn parse-file [filename] (with-open [file (io/reader filename)] (doseq [line (line-seq file)] (let [record (parse-row line)] (println record)) ;; replace println record with your own logic (def line-count (inc line-count))))) (defn process-file [filename] (do (def line-count 0) (parse-file filename) (println line-count))) (defn -main [& args] (process-file (first args))) Feel free to ask questions if you need more info. Kind regards Rudi On 21/01/2014, at 5:55 PM, Jarrod Swart <[email protected]> wrote: > I'm processing a large csv with Clojure, honestly not even that big (~18k > rows, 11mb). I have a list of exported data from a client and I am > de-duplicating URLs within the list. My final output is a series of vectors: > [url url-hash]. > > The odd thing is how slow it seems to be going. I have tried implementing > this as a reduce, and finally I thought to speed things up I might try a > "with-open and a loop-recur". It doesn't seem to have done much in my case. > I know I am doing something wrong I'm just not sure what yet. The best I can > do is about 4 seconds, which may only seem slow because I implemented it in > python first and it takes a half second to finish. Still this is one of the > smaller files I will likely deal with so I'm worried that as the files grow > it may get too slow. > > The code is here on ref-heap for easy viewing: https://www.refheap.com/26098 > > Any advice is appreciated. > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to [email protected] > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
