I mostly revert to good ole loop/recur for these large file processing
exercises. Here's a template you could use (includes a try/catch so
you can see errors as you go);
(import '(java.io BufferedReader FileReader PrintWriter File))
(defn process-log-file
"Read a log file tracting lines matching regx."
[in-fp out-fp regx]
(with-open [rdr (BufferedReader. (FileReader. (File. in-fp)))
wtr (PrintWriter. (File. out-fp))]
(loop [line (.readLine rdr) i 0]
(if line
(try
(let [fnd (re-matches regx line)]
(when-not (nil? fnd)
(.println wtr line))) ; or whatever
(recur (.readLine rdr) (inc i))
(catch Exception e (prn line e)))
))))
Regards, Adrian.
On Mon, Aug 31, 2009 at 4:44 PM, wangzx<[email protected]> wrote:
>
> I just want to learn clojure by using it to parse log file and
> generate reports. and one question is: for a large text file, can we
> use it as a sequence effectively? for example, for a 100M log file, we
> need to check each line for some pattern match.
>
> I just using the (line-seq rdr) but it will cause
> OutOfMemoryException.
>
> demo code
>
> (defn buffered-reader [file]
> (new java.io.BufferedReader
> (new java.io.InputStreamReader
> (new java.io.FileInputStream file))))
>
> (def -reader (buffered-reader "test.txt"))
> (filter #(= "some" %) -reader)
>
> even there is no lines match "some", the filter operation will cause
> OutOfMemoryException.
>
> Is there other APIs like the Sequence but provide stream-like API?
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---