Create a fresh reader?

On Wed, Nov 20, 2013 at 1:13 AM, Jiaqi Liu <liujiaq...@gmail.com> wrote:

> hi,Mark, thanks for your assistance.
> Now, i modified the function "parse-recur" as you said .
> And the new "parse-file" function code is:
>    .....
>   (with-open [rdr (io/reader file)]
>     (let [res (parse-recur *(line-seq rdr) *{});lines)
>           sorted
>           (into (sorted-map-by (fn [key1 key2]
>                                  (compare [(get res key2) key2]
>                                           [(get res key1) key1])))
>                 res)]
>       (find-write-recur *(line-seq rdr) *sorted n)
>       ))
>     ......
> Here is another problem, as you can see , in the "parse-file" function i
> have called twice *(line-seq rdr) .*
> The first is to get Top-N users' list, the second is try to split the
> top-n users' record and store in separate log file.
> But when i have second call , the  *(line-seq rdr) *return nil !!
> Seems like the read cursor is at the end of the file, how can i fix this
> problem?
>
>
>
> 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com>
>
>> Yeah, I see now that you're still holding on to the head because a name
>> is given to the line sequence in the functions that you call.
>> One option would be making parse-recur and related functions that take
>> lines as an input into a macro.
>>
>> You could also try:
>>
>> (defn parse-recur
>>   ""
>>   [ls res]
>>     (if ls
>>       (recur (next ls)
>>                (update-res res (first ls)))
>>       res))
>>
>> and calling (parse-recur (line-seq rdr) {})
>>
>> This way, the recur goes back to the main function entry point and ls is
>> overwritten, so nothing is holding on to the head.  Make similar changes to
>> the other functions.
>>
>>
>>
>> On Tue, Nov 19, 2013 at 8:07 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote:
>>
>>> sorry, i mean "weird"...
>>>
>>>
>>> 2013/11/20 Jiaqi Liu <liujiaq...@gmail.com>
>>>
>>>> hi,Mark ,thanks for your suggestion.
>>>> I modified the main function  to :
>>>> ;================================
>>>> (defn parse-file
>>>>   ""
>>>>   [file n]
>>>>   (with-open [rdr (io/reader file)]
>>>>     (println "001 begin with open " (type rdr))
>>>>     (let [;lines (line-seq rdr)
>>>>           *res (parse-recur (line-seq rdr))*;lines)
>>>>            sorted
>>>>           (into (sorted-map-by (fn [key1 key2]
>>>>                                  (compare [(get res key2) key2]
>>>>                                           [(get res key1) key1])))
>>>>                 res)]
>>>>       (println "Statistic result : " res)
>>>>       (println "Sorted result : " sorted)
>>>>       ;(println "..." (type rdr))
>>>>       ;(find-write-recur lines sorted n)
>>>>       *(find-write-recur (line-seq rdr) sorted n)*
>>>>       )))
>>>> ;================================
>>>> But it's wired , i got this error:
>>>>
>>>> com.util=> (parse-file "./log600w.log" 3)
>>>>
>>>> 001 begin with open  java.io.BufferedReader
>>>>
>>>>
>>>> com.util=> *OutOfMemoryError GC overhead limit exceeded
>>>>  java.util.regex.Pattern.matcher (Pattern.java:1088)*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com>
>>>>
>>>>> Looks like you're "holding on to the head" by giving a name (lines) to
>>>>> the result of line-seq.  Don't do that.  Try:
>>>>> (parse-recur (line-seq rdr))
>>>>>
>>>>>
>>>>> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com>wrote:
>>>>>
>>>>>> Hi,all
>>>>>> I want to parse big log files using Clojure.
>>>>>> And the structure of each line record is
>>>>>> "UserID,Lantitude,Lontitude,Timestamp".
>>>>>> My implemented steps are:
>>>>>> ----> Read log file & Get top-n user list
>>>>>> ----> Find each top-n user's records and store in separate log file
>>>>>> (UserID.log) .
>>>>>>
>>>>>> The implement source code :
>>>>>> ;======================================================
>>>>>> (defn parse-file
>>>>>>   ""
>>>>>>   [file n]
>>>>>>   (with-open [rdr (io/reader file)]
>>>>>>     (println "001 begin with open ")
>>>>>>     (let [lines (line-seq rdr)
>>>>>>           res (parse-recur lines)
>>>>>>           sorted
>>>>>>           (into (sorted-map-by (fn [key1 key2]
>>>>>>                                  (compare [(get res key2) key2]
>>>>>>                                           [(get res key1) key1])))
>>>>>>                 res)]
>>>>>>       (println "Statistic result : " res)
>>>>>>       (println "Top-N User List : " sorted)
>>>>>>       (find-write-recur lines sorted n)
>>>>>>       )))
>>>>>>
>>>>>> (defn parse-recur
>>>>>>   ""
>>>>>>   [lines]
>>>>>>   (loop [ls  lines
>>>>>>          res {}]
>>>>>>     (if ls
>>>>>>       (recur (next ls)
>>>>>>                (update-res res (first ls)))
>>>>>>       res)))
>>>>>>
>>>>>> (defn update-res
>>>>>>   ""
>>>>>>   [res line]
>>>>>>   (let [params (string/split line #",")
>>>>>>         id     (if (> (count params) 1) (params 0) "0")]
>>>>>>     (if (res id)
>>>>>>       (update-in res [id] inc)
>>>>>>       (assoc res id 1))))
>>>>>>
>>>>>> (defn find-write-recur
>>>>>>   "Get each users' records and store into separate log file"
>>>>>>   [lines sorted n]
>>>>>>   (loop [x n
>>>>>>          sd sorted
>>>>>>          id (first (keys sd))]
>>>>>>     (if (and (> x 0) sd)
>>>>>>       (do (create-write-file id
>>>>>>                              (find-recur lines id))
>>>>>>           (recur (dec x)
>>>>>>                  (rest sd)
>>>>>>                  (nth (keys sd) 1))))))
>>>>>>
>>>>>> (defn find-recur
>>>>>>   ""
>>>>>>   [lines id]
>>>>>>   (loop [ls lines
>>>>>>            res []]
>>>>>>     (if ls
>>>>>>       (recur (next ls)
>>>>>>                (update-vec res id (first ls)))
>>>>>>       res)))
>>>>>>
>>>>>> (defn update-vec
>>>>>>   ""
>>>>>>   [res id line]
>>>>>>   (let [params (string/split line #",")
>>>>>>         id_        (if (> (count params) 1) (params 0) "0")]
>>>>>>         (if (= id id_ )
>>>>>>           (conj res line)
>>>>>>           res)))
>>>>>>
>>>>>> (defn create-write-file
>>>>>>   "Create a new file and write information into the file."
>>>>>>   ([file info-lines]
>>>>>>    (with-open [wr (io/writer (str MAIN-PATH file))]
>>>>>>      (doseq [line info-lines] (.write wr (str line "\n")))
>>>>>>      ))
>>>>>>   ([file info-lines append?]
>>>>>>    (with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
>>>>>>      (doseq [line info-lines] (.write wr (str line "\n"))))
>>>>>>    ))
>>>>>> ;======================================================
>>>>>>
>>>>>> I tested this clj in REPL with command (parse-file "./DATA/log.log"
>>>>>> 3), and get the results:
>>>>>>
>>>>>> Records         Size          Time      Result
>>>>>> 1,000             42KB         <1s         OK
>>>>>> 10,000           420KB       <1s         OK
>>>>>> 100,000          4.3MB        3s          OK
>>>>>> 1,000,000       43MB         15s         OK
>>>>>> 6,000,000       258MB       >20M      "OutOfMemoryError Java heap
>>>>>> space  java.lang.String.substring (String.java:1913)"
>>>>>>
>>>>>> ======================================================
>>>>>> Here is the question:
>>>>>> 1. how can i fix the error when i try to parse big log file , like >
>>>>>> 200MB
>>>>>> 2. how can i optimize the function to run faster ?
>>>>>> 3. there are logs more than 1G size , how can the function deal with
>>>>>> it.
>>>>>>
>>>>>> I am still new to Clojure, any suggestion or solution will be
>>>>>> appreciate~
>>>>>> Thanks
>>>>>>
>>>>>> BR
>>>>>>
>>>>>> ------------------------------------
>>>>>>
>>>>>> 刘家齐 (Jacky Liu)
>>>>>>
>>>>>>
>>>>>>
>>>>>> 手机:15201091195        邮箱:liujiaq...@gmail.com
>>>>>>
>>>>>> Skype:jacky_liu_1987   QQ:406229156
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Clojure" group.
>>>>>> To post to this group, send email to clojure@googlegroups.com
>>>>>> Note that posts from new members are moderated - please be patient
>>>>>> with your first post.
>>>>>> To unsubscribe from this group, send email to
>>>>>> clojure+unsubscr...@googlegroups.com
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/clojure?hl=en
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Clojure" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to clojure+unsubscr...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>
>>>>>  --
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To post to this group, send email to clojure@googlegroups.com
>>>>> Note that posts from new members are moderated - please be patient
>>>>> with your first post.
>>>>> To unsubscribe from this group, send email to
>>>>> clojure+unsubscr...@googlegroups.com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/clojure?hl=en
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to clojure+unsubscr...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ------------------------------------
>>>>
>>>> 刘家齐 (Jacky Liu)
>>>>
>>>>
>>>>
>>>> 手机:15201091195        邮箱:liujiaq...@gmail.com
>>>>
>>>> Skype:jacky_liu_1987   QQ:406229156
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> ------------------------------------
>>>
>>> 刘家齐 (Jacky Liu)
>>>
>>>
>>>
>>> 手机:15201091195        邮箱:liujiaq...@gmail.com
>>>
>>> Skype:jacky_liu_1987   QQ:406229156
>>>
>>> --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clojure@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
>
> ------------------------------------
>
> 刘家齐 (Jacky Liu)
>
>
>
> 手机:15201091195        邮箱:liujiaq...@gmail.com
>
> Skype:jacky_liu_1987   QQ:406229156
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to