hi,Mark, thanks for your assistance.
Now, i modified the function "parse-recur" as you said .
And the new "parse-file" function code is:
.....
(with-open [rdr (io/reader file)]
(let [res (parse-recur *(line-seq rdr) *{});lines)
sorted
(into (sorted-map-by (fn [key1 key2]
(compare [(get res key2) key2]
[(get res key1) key1])))
res)]
(find-write-recur *(line-seq rdr) *sorted n)
))
......
Here is another problem, as you can see , in the "parse-file" function i
have called twice *(line-seq rdr) .*
The first is to get Top-N users' list, the second is try to split the top-n
users' record and store in separate log file.
But when i have second call , the *(line-seq rdr) *return nil !!
Seems like the read cursor is at the end of the file, how can i fix this
problem?
2013/11/20 Mark Engelberg <[email protected]>
> Yeah, I see now that you're still holding on to the head because a name is
> given to the line sequence in the functions that you call.
> One option would be making parse-recur and related functions that take
> lines as an input into a macro.
>
> You could also try:
>
> (defn parse-recur
> ""
> [ls res]
> (if ls
> (recur (next ls)
> (update-res res (first ls)))
> res))
>
> and calling (parse-recur (line-seq rdr) {})
>
> This way, the recur goes back to the main function entry point and ls is
> overwritten, so nothing is holding on to the head. Make similar changes to
> the other functions.
>
>
>
> On Tue, Nov 19, 2013 at 8:07 PM, Jiaqi Liu <[email protected]> wrote:
>
>> sorry, i mean "weird"...
>>
>>
>> 2013/11/20 Jiaqi Liu <[email protected]>
>>
>>> hi,Mark ,thanks for your suggestion.
>>> I modified the main function to :
>>> ;================================
>>> (defn parse-file
>>> ""
>>> [file n]
>>> (with-open [rdr (io/reader file)]
>>> (println "001 begin with open " (type rdr))
>>> (let [;lines (line-seq rdr)
>>> *res (parse-recur (line-seq rdr))*;lines)
>>> sorted
>>> (into (sorted-map-by (fn [key1 key2]
>>> (compare [(get res key2) key2]
>>> [(get res key1) key1])))
>>> res)]
>>> (println "Statistic result : " res)
>>> (println "Sorted result : " sorted)
>>> ;(println "..." (type rdr))
>>> ;(find-write-recur lines sorted n)
>>> *(find-write-recur (line-seq rdr) sorted n)*
>>> )))
>>> ;================================
>>> But it's wired , i got this error:
>>>
>>> com.util=> (parse-file "./log600w.log" 3)
>>>
>>> 001 begin with open java.io.BufferedReader
>>>
>>>
>>> com.util=> *OutOfMemoryError GC overhead limit exceeded
>>> java.util.regex.Pattern.matcher (Pattern.java:1088)*
>>>
>>>
>>>
>>>
>>>
>>> 2013/11/20 Mark Engelberg <[email protected]>
>>>
>>>> Looks like you're "holding on to the head" by giving a name (lines) to
>>>> the result of line-seq. Don't do that. Try:
>>>> (parse-recur (line-seq rdr))
>>>>
>>>>
>>>> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <[email protected]>wrote:
>>>>
>>>>> Hi,all
>>>>> I want to parse big log files using Clojure.
>>>>> And the structure of each line record is
>>>>> "UserID,Lantitude,Lontitude,Timestamp".
>>>>> My implemented steps are:
>>>>> ----> Read log file & Get top-n user list
>>>>> ----> Find each top-n user's records and store in separate log file
>>>>> (UserID.log) .
>>>>>
>>>>> The implement source code :
>>>>> ;======================================================
>>>>> (defn parse-file
>>>>> ""
>>>>> [file n]
>>>>> (with-open [rdr (io/reader file)]
>>>>> (println "001 begin with open ")
>>>>> (let [lines (line-seq rdr)
>>>>> res (parse-recur lines)
>>>>> sorted
>>>>> (into (sorted-map-by (fn [key1 key2]
>>>>> (compare [(get res key2) key2]
>>>>> [(get res key1) key1])))
>>>>> res)]
>>>>> (println "Statistic result : " res)
>>>>> (println "Top-N User List : " sorted)
>>>>> (find-write-recur lines sorted n)
>>>>> )))
>>>>>
>>>>> (defn parse-recur
>>>>> ""
>>>>> [lines]
>>>>> (loop [ls lines
>>>>> res {}]
>>>>> (if ls
>>>>> (recur (next ls)
>>>>> (update-res res (first ls)))
>>>>> res)))
>>>>>
>>>>> (defn update-res
>>>>> ""
>>>>> [res line]
>>>>> (let [params (string/split line #",")
>>>>> id (if (> (count params) 1) (params 0) "0")]
>>>>> (if (res id)
>>>>> (update-in res [id] inc)
>>>>> (assoc res id 1))))
>>>>>
>>>>> (defn find-write-recur
>>>>> "Get each users' records and store into separate log file"
>>>>> [lines sorted n]
>>>>> (loop [x n
>>>>> sd sorted
>>>>> id (first (keys sd))]
>>>>> (if (and (> x 0) sd)
>>>>> (do (create-write-file id
>>>>> (find-recur lines id))
>>>>> (recur (dec x)
>>>>> (rest sd)
>>>>> (nth (keys sd) 1))))))
>>>>>
>>>>> (defn find-recur
>>>>> ""
>>>>> [lines id]
>>>>> (loop [ls lines
>>>>> res []]
>>>>> (if ls
>>>>> (recur (next ls)
>>>>> (update-vec res id (first ls)))
>>>>> res)))
>>>>>
>>>>> (defn update-vec
>>>>> ""
>>>>> [res id line]
>>>>> (let [params (string/split line #",")
>>>>> id_ (if (> (count params) 1) (params 0) "0")]
>>>>> (if (= id id_ )
>>>>> (conj res line)
>>>>> res)))
>>>>>
>>>>> (defn create-write-file
>>>>> "Create a new file and write information into the file."
>>>>> ([file info-lines]
>>>>> (with-open [wr (io/writer (str MAIN-PATH file))]
>>>>> (doseq [line info-lines] (.write wr (str line "\n")))
>>>>> ))
>>>>> ([file info-lines append?]
>>>>> (with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
>>>>> (doseq [line info-lines] (.write wr (str line "\n"))))
>>>>> ))
>>>>> ;======================================================
>>>>>
>>>>> I tested this clj in REPL with command (parse-file "./DATA/log.log"
>>>>> 3), and get the results:
>>>>>
>>>>> Records Size Time Result
>>>>> 1,000 42KB <1s OK
>>>>> 10,000 420KB <1s OK
>>>>> 100,000 4.3MB 3s OK
>>>>> 1,000,000 43MB 15s OK
>>>>> 6,000,000 258MB >20M "OutOfMemoryError Java heap
>>>>> space java.lang.String.substring (String.java:1913)"
>>>>>
>>>>> ======================================================
>>>>> Here is the question:
>>>>> 1. how can i fix the error when i try to parse big log file , like >
>>>>> 200MB
>>>>> 2. how can i optimize the function to run faster ?
>>>>> 3. there are logs more than 1G size , how can the function deal with
>>>>> it.
>>>>>
>>>>> I am still new to Clojure, any suggestion or solution will be
>>>>> appreciate~
>>>>> Thanks
>>>>>
>>>>> BR
>>>>>
>>>>> ------------------------------------
>>>>>
>>>>> 刘家齐 (Jacky Liu)
>>>>>
>>>>>
>>>>>
>>>>> 手机:15201091195 邮箱:[email protected]
>>>>>
>>>>> Skype:jacky_liu_1987 QQ:406229156
>>>>>
>>>>> --
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To post to this group, send email to [email protected]
>>>>> Note that posts from new members are moderated - please be patient
>>>>> with your first post.
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/clojure?hl=en
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>> --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to [email protected]
>>>> Note that posts from new members are moderated - please be patient with
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> ------------------------------------
>>>
>>> 刘家齐 (Jacky Liu)
>>>
>>>
>>>
>>> 手机:15201091195 邮箱:[email protected]
>>>
>>> Skype:jacky_liu_1987 QQ:406229156
>>>
>>
>>
>>
>> --
>>
>> ------------------------------------
>>
>> 刘家齐 (Jacky Liu)
>>
>>
>>
>> 手机:15201091195 邮箱:[email protected]
>>
>> Skype:jacky_liu_1987 QQ:406229156
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to [email protected]
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
--
------------------------------------
刘家齐 (Jacky Liu)
手机:15201091195 邮箱:[email protected]
Skype:jacky_liu_1987 QQ:406229156
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.