Create a fresh reader?
On Wed, Nov 20, 2013 at 1:13 AM, Jiaqi Liu <liujiaq...@gmail.com> wrote: > hi,Mark, thanks for your assistance. > Now, i modified the function "parse-recur" as you said . > And the new "parse-file" function code is: > ..... > (with-open [rdr (io/reader file)] > (let [res (parse-recur *(line-seq rdr) *{});lines) > sorted > (into (sorted-map-by (fn [key1 key2] > (compare [(get res key2) key2] > [(get res key1) key1]))) > res)] > (find-write-recur *(line-seq rdr) *sorted n) > )) > ...... > Here is another problem, as you can see , in the "parse-file" function i > have called twice *(line-seq rdr) .* > The first is to get Top-N users' list, the second is try to split the > top-n users' record and store in separate log file. > But when i have second call , the *(line-seq rdr) *return nil !! > Seems like the read cursor is at the end of the file, how can i fix this > problem? > > > > 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com> > >> Yeah, I see now that you're still holding on to the head because a name >> is given to the line sequence in the functions that you call. >> One option would be making parse-recur and related functions that take >> lines as an input into a macro. >> >> You could also try: >> >> (defn parse-recur >> "" >> [ls res] >> (if ls >> (recur (next ls) >> (update-res res (first ls))) >> res)) >> >> and calling (parse-recur (line-seq rdr) {}) >> >> This way, the recur goes back to the main function entry point and ls is >> overwritten, so nothing is holding on to the head. Make similar changes to >> the other functions. >> >> >> >> On Tue, Nov 19, 2013 at 8:07 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote: >> >>> sorry, i mean "weird"... >>> >>> >>> 2013/11/20 Jiaqi Liu <liujiaq...@gmail.com> >>> >>>> hi,Mark ,thanks for your suggestion. >>>> I modified the main function to : >>>> ;================================ >>>> (defn parse-file >>>> "" >>>> [file n] >>>> (with-open [rdr (io/reader file)] >>>> (println "001 begin with open " (type rdr)) >>>> (let [;lines (line-seq rdr) >>>> *res (parse-recur (line-seq rdr))*;lines) >>>> sorted >>>> (into (sorted-map-by (fn [key1 key2] >>>> (compare [(get res key2) key2] >>>> [(get res key1) key1]))) >>>> res)] >>>> (println "Statistic result : " res) >>>> (println "Sorted result : " sorted) >>>> ;(println "..." (type rdr)) >>>> ;(find-write-recur lines sorted n) >>>> *(find-write-recur (line-seq rdr) sorted n)* >>>> ))) >>>> ;================================ >>>> But it's wired , i got this error: >>>> >>>> com.util=> (parse-file "./log600w.log" 3) >>>> >>>> 001 begin with open java.io.BufferedReader >>>> >>>> >>>> com.util=> *OutOfMemoryError GC overhead limit exceeded >>>> java.util.regex.Pattern.matcher (Pattern.java:1088)* >>>> >>>> >>>> >>>> >>>> >>>> 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com> >>>> >>>>> Looks like you're "holding on to the head" by giving a name (lines) to >>>>> the result of line-seq. Don't do that. Try: >>>>> (parse-recur (line-seq rdr)) >>>>> >>>>> >>>>> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com>wrote: >>>>> >>>>>> Hi,all >>>>>> I want to parse big log files using Clojure. >>>>>> And the structure of each line record is >>>>>> "UserID,Lantitude,Lontitude,Timestamp". >>>>>> My implemented steps are: >>>>>> ----> Read log file & Get top-n user list >>>>>> ----> Find each top-n user's records and store in separate log file >>>>>> (UserID.log) . >>>>>> >>>>>> The implement source code : >>>>>> ;====================================================== >>>>>> (defn parse-file >>>>>> "" >>>>>> [file n] >>>>>> (with-open [rdr (io/reader file)] >>>>>> (println "001 begin with open ") >>>>>> (let [lines (line-seq rdr) >>>>>> res (parse-recur lines) >>>>>> sorted >>>>>> (into (sorted-map-by (fn [key1 key2] >>>>>> (compare [(get res key2) key2] >>>>>> [(get res key1) key1]))) >>>>>> res)] >>>>>> (println "Statistic result : " res) >>>>>> (println "Top-N User List : " sorted) >>>>>> (find-write-recur lines sorted n) >>>>>> ))) >>>>>> >>>>>> (defn parse-recur >>>>>> "" >>>>>> [lines] >>>>>> (loop [ls lines >>>>>> res {}] >>>>>> (if ls >>>>>> (recur (next ls) >>>>>> (update-res res (first ls))) >>>>>> res))) >>>>>> >>>>>> (defn update-res >>>>>> "" >>>>>> [res line] >>>>>> (let [params (string/split line #",") >>>>>> id (if (> (count params) 1) (params 0) "0")] >>>>>> (if (res id) >>>>>> (update-in res [id] inc) >>>>>> (assoc res id 1)))) >>>>>> >>>>>> (defn find-write-recur >>>>>> "Get each users' records and store into separate log file" >>>>>> [lines sorted n] >>>>>> (loop [x n >>>>>> sd sorted >>>>>> id (first (keys sd))] >>>>>> (if (and (> x 0) sd) >>>>>> (do (create-write-file id >>>>>> (find-recur lines id)) >>>>>> (recur (dec x) >>>>>> (rest sd) >>>>>> (nth (keys sd) 1)))))) >>>>>> >>>>>> (defn find-recur >>>>>> "" >>>>>> [lines id] >>>>>> (loop [ls lines >>>>>> res []] >>>>>> (if ls >>>>>> (recur (next ls) >>>>>> (update-vec res id (first ls))) >>>>>> res))) >>>>>> >>>>>> (defn update-vec >>>>>> "" >>>>>> [res id line] >>>>>> (let [params (string/split line #",") >>>>>> id_ (if (> (count params) 1) (params 0) "0")] >>>>>> (if (= id id_ ) >>>>>> (conj res line) >>>>>> res))) >>>>>> >>>>>> (defn create-write-file >>>>>> "Create a new file and write information into the file." >>>>>> ([file info-lines] >>>>>> (with-open [wr (io/writer (str MAIN-PATH file))] >>>>>> (doseq [line info-lines] (.write wr (str line "\n"))) >>>>>> )) >>>>>> ([file info-lines append?] >>>>>> (with-open [wr (io/writer (str MAIN-PATH file) :append append?)] >>>>>> (doseq [line info-lines] (.write wr (str line "\n")))) >>>>>> )) >>>>>> ;====================================================== >>>>>> >>>>>> I tested this clj in REPL with command (parse-file "./DATA/log.log" >>>>>> 3), and get the results: >>>>>> >>>>>> Records Size Time Result >>>>>> 1,000 42KB <1s OK >>>>>> 10,000 420KB <1s OK >>>>>> 100,000 4.3MB 3s OK >>>>>> 1,000,000 43MB 15s OK >>>>>> 6,000,000 258MB >20M "OutOfMemoryError Java heap >>>>>> space java.lang.String.substring (String.java:1913)" >>>>>> >>>>>> ====================================================== >>>>>> Here is the question: >>>>>> 1. how can i fix the error when i try to parse big log file , like > >>>>>> 200MB >>>>>> 2. how can i optimize the function to run faster ? >>>>>> 3. there are logs more than 1G size , how can the function deal with >>>>>> it. >>>>>> >>>>>> I am still new to Clojure, any suggestion or solution will be >>>>>> appreciate~ >>>>>> Thanks >>>>>> >>>>>> BR >>>>>> >>>>>> ------------------------------------ >>>>>> >>>>>> 刘家齐 (Jacky Liu) >>>>>> >>>>>> >>>>>> >>>>>> 手机:15201091195 邮箱:liujiaq...@gmail.com >>>>>> >>>>>> Skype:jacky_liu_1987 QQ:406229156 >>>>>> >>>>>> -- >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Clojure" group. >>>>>> To post to this group, send email to clojure@googlegroups.com >>>>>> Note that posts from new members are moderated - please be patient >>>>>> with your first post. >>>>>> To unsubscribe from this group, send email to >>>>>> clojure+unsubscr...@googlegroups.com >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/clojure?hl=en >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Clojure" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to clojure+unsubscr...@googlegroups.com. >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>> >>>>> -- >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> To post to this group, send email to clojure@googlegroups.com >>>>> Note that posts from new members are moderated - please be patient >>>>> with your first post. >>>>> To unsubscribe from this group, send email to >>>>> clojure+unsubscr...@googlegroups.com >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/clojure?hl=en >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to clojure+unsubscr...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> ------------------------------------ >>>> >>>> 刘家齐 (Jacky Liu) >>>> >>>> >>>> >>>> 手机:15201091195 邮箱:liujiaq...@gmail.com >>>> >>>> Skype:jacky_liu_1987 QQ:406229156 >>>> >>> >>> >>> >>> -- >>> >>> ------------------------------------ >>> >>> 刘家齐 (Jacky Liu) >>> >>> >>> >>> 手机:15201091195 邮箱:liujiaq...@gmail.com >>> >>> Skype:jacky_liu_1987 QQ:406229156 >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clojure@googlegroups.com >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+unsubscr...@googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > > ------------------------------------ > > 刘家齐 (Jacky Liu) > > > > 手机:15201091195 邮箱:liujiaq...@gmail.com > > Skype:jacky_liu_1987 QQ:406229156 > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.