hi,Mark, thanks for your assistance. Now, i modified the function "parse-recur" as you said . And the new "parse-file" function code is: ..... (with-open [rdr (io/reader file)] (let [res (parse-recur *(line-seq rdr) *{});lines) sorted (into (sorted-map-by (fn [key1 key2] (compare [(get res key2) key2] [(get res key1) key1]))) res)] (find-write-recur *(line-seq rdr) *sorted n) )) ...... Here is another problem, as you can see , in the "parse-file" function i have called twice *(line-seq rdr) .* The first is to get Top-N users' list, the second is try to split the top-n users' record and store in separate log file. But when i have second call , the *(line-seq rdr) *return nil !! Seems like the read cursor is at the end of the file, how can i fix this problem?
2013/11/20 Mark Engelberg <mark.engelb...@gmail.com> > Yeah, I see now that you're still holding on to the head because a name is > given to the line sequence in the functions that you call. > One option would be making parse-recur and related functions that take > lines as an input into a macro. > > You could also try: > > (defn parse-recur > "" > [ls res] > (if ls > (recur (next ls) > (update-res res (first ls))) > res)) > > and calling (parse-recur (line-seq rdr) {}) > > This way, the recur goes back to the main function entry point and ls is > overwritten, so nothing is holding on to the head. Make similar changes to > the other functions. > > > > On Tue, Nov 19, 2013 at 8:07 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote: > >> sorry, i mean "weird"... >> >> >> 2013/11/20 Jiaqi Liu <liujiaq...@gmail.com> >> >>> hi,Mark ,thanks for your suggestion. >>> I modified the main function to : >>> ;================================ >>> (defn parse-file >>> "" >>> [file n] >>> (with-open [rdr (io/reader file)] >>> (println "001 begin with open " (type rdr)) >>> (let [;lines (line-seq rdr) >>> *res (parse-recur (line-seq rdr))*;lines) >>> sorted >>> (into (sorted-map-by (fn [key1 key2] >>> (compare [(get res key2) key2] >>> [(get res key1) key1]))) >>> res)] >>> (println "Statistic result : " res) >>> (println "Sorted result : " sorted) >>> ;(println "..." (type rdr)) >>> ;(find-write-recur lines sorted n) >>> *(find-write-recur (line-seq rdr) sorted n)* >>> ))) >>> ;================================ >>> But it's wired , i got this error: >>> >>> com.util=> (parse-file "./log600w.log" 3) >>> >>> 001 begin with open java.io.BufferedReader >>> >>> >>> com.util=> *OutOfMemoryError GC overhead limit exceeded >>> java.util.regex.Pattern.matcher (Pattern.java:1088)* >>> >>> >>> >>> >>> >>> 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com> >>> >>>> Looks like you're "holding on to the head" by giving a name (lines) to >>>> the result of line-seq. Don't do that. Try: >>>> (parse-recur (line-seq rdr)) >>>> >>>> >>>> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com>wrote: >>>> >>>>> Hi,all >>>>> I want to parse big log files using Clojure. >>>>> And the structure of each line record is >>>>> "UserID,Lantitude,Lontitude,Timestamp". >>>>> My implemented steps are: >>>>> ----> Read log file & Get top-n user list >>>>> ----> Find each top-n user's records and store in separate log file >>>>> (UserID.log) . >>>>> >>>>> The implement source code : >>>>> ;====================================================== >>>>> (defn parse-file >>>>> "" >>>>> [file n] >>>>> (with-open [rdr (io/reader file)] >>>>> (println "001 begin with open ") >>>>> (let [lines (line-seq rdr) >>>>> res (parse-recur lines) >>>>> sorted >>>>> (into (sorted-map-by (fn [key1 key2] >>>>> (compare [(get res key2) key2] >>>>> [(get res key1) key1]))) >>>>> res)] >>>>> (println "Statistic result : " res) >>>>> (println "Top-N User List : " sorted) >>>>> (find-write-recur lines sorted n) >>>>> ))) >>>>> >>>>> (defn parse-recur >>>>> "" >>>>> [lines] >>>>> (loop [ls lines >>>>> res {}] >>>>> (if ls >>>>> (recur (next ls) >>>>> (update-res res (first ls))) >>>>> res))) >>>>> >>>>> (defn update-res >>>>> "" >>>>> [res line] >>>>> (let [params (string/split line #",") >>>>> id (if (> (count params) 1) (params 0) "0")] >>>>> (if (res id) >>>>> (update-in res [id] inc) >>>>> (assoc res id 1)))) >>>>> >>>>> (defn find-write-recur >>>>> "Get each users' records and store into separate log file" >>>>> [lines sorted n] >>>>> (loop [x n >>>>> sd sorted >>>>> id (first (keys sd))] >>>>> (if (and (> x 0) sd) >>>>> (do (create-write-file id >>>>> (find-recur lines id)) >>>>> (recur (dec x) >>>>> (rest sd) >>>>> (nth (keys sd) 1)))))) >>>>> >>>>> (defn find-recur >>>>> "" >>>>> [lines id] >>>>> (loop [ls lines >>>>> res []] >>>>> (if ls >>>>> (recur (next ls) >>>>> (update-vec res id (first ls))) >>>>> res))) >>>>> >>>>> (defn update-vec >>>>> "" >>>>> [res id line] >>>>> (let [params (string/split line #",") >>>>> id_ (if (> (count params) 1) (params 0) "0")] >>>>> (if (= id id_ ) >>>>> (conj res line) >>>>> res))) >>>>> >>>>> (defn create-write-file >>>>> "Create a new file and write information into the file." >>>>> ([file info-lines] >>>>> (with-open [wr (io/writer (str MAIN-PATH file))] >>>>> (doseq [line info-lines] (.write wr (str line "\n"))) >>>>> )) >>>>> ([file info-lines append?] >>>>> (with-open [wr (io/writer (str MAIN-PATH file) :append append?)] >>>>> (doseq [line info-lines] (.write wr (str line "\n")))) >>>>> )) >>>>> ;====================================================== >>>>> >>>>> I tested this clj in REPL with command (parse-file "./DATA/log.log" >>>>> 3), and get the results: >>>>> >>>>> Records Size Time Result >>>>> 1,000 42KB <1s OK >>>>> 10,000 420KB <1s OK >>>>> 100,000 4.3MB 3s OK >>>>> 1,000,000 43MB 15s OK >>>>> 6,000,000 258MB >20M "OutOfMemoryError Java heap >>>>> space java.lang.String.substring (String.java:1913)" >>>>> >>>>> ====================================================== >>>>> Here is the question: >>>>> 1. how can i fix the error when i try to parse big log file , like > >>>>> 200MB >>>>> 2. how can i optimize the function to run faster ? >>>>> 3. there are logs more than 1G size , how can the function deal with >>>>> it. >>>>> >>>>> I am still new to Clojure, any suggestion or solution will be >>>>> appreciate~ >>>>> Thanks >>>>> >>>>> BR >>>>> >>>>> ------------------------------------ >>>>> >>>>> 刘家齐 (Jacky Liu) >>>>> >>>>> >>>>> >>>>> 手机:15201091195 邮箱:liujiaq...@gmail.com >>>>> >>>>> Skype:jacky_liu_1987 QQ:406229156 >>>>> >>>>> -- >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> To post to this group, send email to clojure@googlegroups.com >>>>> Note that posts from new members are moderated - please be patient >>>>> with your first post. >>>>> To unsubscribe from this group, send email to >>>>> clojure+unsubscr...@googlegroups.com >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/clojure?hl=en >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to clojure+unsubscr...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> -- >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To post to this group, send email to clojure@googlegroups.com >>>> Note that posts from new members are moderated - please be patient with >>>> your first post. >>>> To unsubscribe from this group, send email to >>>> clojure+unsubscr...@googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/group/clojure?hl=en >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to clojure+unsubscr...@googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> >>> >>> -- >>> >>> ------------------------------------ >>> >>> 刘家齐 (Jacky Liu) >>> >>> >>> >>> 手机:15201091195 邮箱:liujiaq...@gmail.com >>> >>> Skype:jacky_liu_1987 QQ:406229156 >>> >> >> >> >> -- >> >> ------------------------------------ >> >> 刘家齐 (Jacky Liu) >> >> >> >> 手机:15201091195 邮箱:liujiaq...@gmail.com >> >> Skype:jacky_liu_1987 QQ:406229156 >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- ------------------------------------ 刘家齐 (Jacky Liu) 手机:15201091195 邮箱:liujiaq...@gmail.com Skype:jacky_liu_1987 QQ:406229156 -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.