hi,Mark ,thanks for your suggestion.
I modified the main function to :
;================================
(defn parse-file
""
[file n]
(with-open [rdr (io/reader file)]
(println "001 begin with open " (type rdr))
(let [;lines (line-seq rdr)
*res (parse-recur (line-seq rdr))*;lines)
sorted
(into (sorted-map-by (fn [key1 key2]
(compare [(get res key2) key2]
[(get res key1) key1])))
res)]
(println "Statistic result : " res)
(println "Sorted result : " sorted)
;(println "..." (type rdr))
;(find-write-recur lines sorted n)
*(find-write-recur (line-seq rdr) sorted n)*
)))
;================================
But it's wired , i got this error:
com.util=> (parse-file "./log600w.log" 3)
001 begin with open java.io.BufferedReader
com.util=> *OutOfMemoryError GC overhead limit exceeded
java.util.regex.Pattern.matcher (Pattern.java:1088)*
2013/11/20 Mark Engelberg <[email protected]>
> Looks like you're "holding on to the head" by giving a name (lines) to the
> result of line-seq. Don't do that. Try:
> (parse-recur (line-seq rdr))
>
>
> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <[email protected]> wrote:
>
>> Hi,all
>> I want to parse big log files using Clojure.
>> And the structure of each line record is
>> "UserID,Lantitude,Lontitude,Timestamp".
>> My implemented steps are:
>> ----> Read log file & Get top-n user list
>> ----> Find each top-n user's records and store in separate log file
>> (UserID.log) .
>>
>> The implement source code :
>> ;======================================================
>> (defn parse-file
>> ""
>> [file n]
>> (with-open [rdr (io/reader file)]
>> (println "001 begin with open ")
>> (let [lines (line-seq rdr)
>> res (parse-recur lines)
>> sorted
>> (into (sorted-map-by (fn [key1 key2]
>> (compare [(get res key2) key2]
>> [(get res key1) key1])))
>> res)]
>> (println "Statistic result : " res)
>> (println "Top-N User List : " sorted)
>> (find-write-recur lines sorted n)
>> )))
>>
>> (defn parse-recur
>> ""
>> [lines]
>> (loop [ls lines
>> res {}]
>> (if ls
>> (recur (next ls)
>> (update-res res (first ls)))
>> res)))
>>
>> (defn update-res
>> ""
>> [res line]
>> (let [params (string/split line #",")
>> id (if (> (count params) 1) (params 0) "0")]
>> (if (res id)
>> (update-in res [id] inc)
>> (assoc res id 1))))
>>
>> (defn find-write-recur
>> "Get each users' records and store into separate log file"
>> [lines sorted n]
>> (loop [x n
>> sd sorted
>> id (first (keys sd))]
>> (if (and (> x 0) sd)
>> (do (create-write-file id
>> (find-recur lines id))
>> (recur (dec x)
>> (rest sd)
>> (nth (keys sd) 1))))))
>>
>> (defn find-recur
>> ""
>> [lines id]
>> (loop [ls lines
>> res []]
>> (if ls
>> (recur (next ls)
>> (update-vec res id (first ls)))
>> res)))
>>
>> (defn update-vec
>> ""
>> [res id line]
>> (let [params (string/split line #",")
>> id_ (if (> (count params) 1) (params 0) "0")]
>> (if (= id id_ )
>> (conj res line)
>> res)))
>>
>> (defn create-write-file
>> "Create a new file and write information into the file."
>> ([file info-lines]
>> (with-open [wr (io/writer (str MAIN-PATH file))]
>> (doseq [line info-lines] (.write wr (str line "\n")))
>> ))
>> ([file info-lines append?]
>> (with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
>> (doseq [line info-lines] (.write wr (str line "\n"))))
>> ))
>> ;======================================================
>>
>> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3),
>> and get the results:
>>
>> Records Size Time Result
>> 1,000 42KB <1s OK
>> 10,000 420KB <1s OK
>> 100,000 4.3MB 3s OK
>> 1,000,000 43MB 15s OK
>> 6,000,000 258MB >20M "OutOfMemoryError Java heap space
>> java.lang.String.substring (String.java:1913)"
>>
>> ======================================================
>> Here is the question:
>> 1. how can i fix the error when i try to parse big log file , like > 200MB
>> 2. how can i optimize the function to run faster ?
>> 3. there are logs more than 1G size , how can the function deal with it.
>>
>> I am still new to Clojure, any suggestion or solution will be appreciate~
>> Thanks
>>
>> BR
>>
>> ------------------------------------
>>
>> 刘家齐 (Jacky Liu)
>>
>>
>>
>> 手机:15201091195 邮箱:[email protected]
>>
>> Skype:jacky_liu_1987 QQ:406229156
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to [email protected]
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
--
------------------------------------
刘家齐 (Jacky Liu)
手机:15201091195 邮箱:[email protected]
Skype:jacky_liu_1987 QQ:406229156
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.