Cheers. But tests suggest to me that (for...) has the same laziness characteristics --or lack thereof-- as does (map...)
On Jul 26, 6:56 pm, Randy Hudson <randy_hud...@mac.com> wrote: > You can get a lazy sequence of all the lines in all the files by > something like: > > (for [file out-files > line (with-open [r (io/reader file)] (line-seq r))] > line) > > If "StatusJSONImpl" is on a separate line, you can throw in a :when > clause to filter them out: > > (for [file out-files > line (with-open [r (io/reader file)] (line-seq r)) > :when (not= line "StatusJSONImpl")] > line) > > If it's a line prefix, you can remove it in the body: > > (for [file out-files > line (with-open [r (io/reader file)] (line-seq r))] > (string/replace line "StatusJSONImpl" "")) > > This is all assuming io is an alias for clojure.java.io, string for > clojure.string, and that getting your files line by line is useful. > > Re OutOfMemoryException: if all the allocated heap memory is really > not freeable, then there's nothing the JVM can do -- it's being asked > to allocate memory for a new object, and there's none available. > > On Jul 26, 9:53 am, atucker <agjf.tuc...@googlemail.com> wrote: > > > Hi all! I have been trying to use Clojure on a student project, but > > it's becoming a bit of a nightmare. I wonder whether anyone can > > help? I'm not studying computer science, and I really need to be > > getting on with the work I'm actually supposed to be doing :) > > > I am trying to work from a lot of Twitter statuses that I saved to > > text file. (Unfortunately I failed to escape quotes and such, so the > > JSON is not valid. Anyone know a good way of coping with that?) > > > Here is my function: > > > (defn json-seq [] > > (apply concat > > (map #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl")) > > out-files))) > > > Now there are forty files and five thousand statuses per file, which > > sounds like a lot, and I don't suppose I can hope to hold them all in > > memory at the same time. But I had thought that my function might > > produce a lazy sequence that would be more manageable. However I > > typically get: > > > twitter.core> (nth (json-seq dir-name) 5) > > ffff"{createdAt=Fri .... etc. GOOD > > > twitter.core> (nth (json-seq dir-name) 5000) > > ffff > > Java heap space > > [Thrown class java.lang.OutOfMemoryError] BAD > > > And at this point my REPL is done for. Any further instruction will > > result in anotherOutOfMemoryError. (Surely that has to be a bug just > > there? Has the garbage collector just given up?) > > > Anyway I am thinking that the sequence is not behaving as lazily as I > > need it to. It's not reading one file at a time, and it's not reading > > thirty-two as I might expect from "chunks", but something in the > > middle. I did try the "dechunkifying" code from page 339 of "Joy of > > Clojure", but that doesn't compile at all :( > > > I do seem to keep running into memory problems with Clojure. I have > > 2GB RAM and am using Snow Leopard, Aquamacs 2.0, Clojure 1.2.0 beta1 > > and Leiningen 1.2.0. > > > Cheers > > Alistair -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en