I have a few doubts how to approach updating values inside pmap and a general
question if map is ideal to solve the problem. I'd really appreciate to get a
review of the following code.
The file-seqC and file-seqB is slightly modified version of file-seq. The plan
was to spawn subfolders of the root folder into concurrent jobs, count files
and output final total number. In my initial tests (file-seqB) I was using atom
which gets updated as soon as each thread completes. It sort of worked although
was giving me some inconsistent results somewhere around total number but
seemed quite there. This looked to me as the last thread sometimes have not
completed on time....
So I've come up with another version (file-seqC). First the pmap returned the
list of total files for each thread and then I reduced that to get the final
result. This approach is solid and I'm quite happy to figure this out but I'm
curious if it's possible to implement files-seqB using Atoms. I'm still not
sure how would I know if all the entries in the pmap completes before the
output is passed further. Shout this used agents instead and completely drop
the idea of using pmap?
In the example below I've included file-seqB function for reference.
Thanks,
Kuba
(ns pdir.core
(:gen-class))
(defn burn []
(dotimes [i 1000] ;; Make it slow
(reduce * (map float (take 1000 (iterate inc i)))))
)
; comment out due to problems with uberjar
; Unable to resolve symbol: ttt in this context, compiling:(pdir/core.clj:35:6);
;
;(defn file-seqB
; "A tree seq on java.io.Files"
; {:added "1.0"}
; [dir]
;
; ;(burn) ;; SLOW things down for testing
;
; (def total (tree-seq
; (fn [^java.io.File f] (. f (isDirectory)))
; (fn [^java.io.File d] (seq (. d (listFiles))))
; dir))
;
; ; takes the number of the files in the directory and update the atom
;
; (swap! ttt + (count total)) ; this gives sort of predictable results
; ; but still not accurate
;
;
; (println "Done: " dir (count total) @ttt)
;
;)
(defn file-seqC
"A tree seq on java.io.Files"
{:added "1.0"}
[dir]
(let [total ; Need to wrap output of the tree-seeq into local
variable
(tree-seq ; Otherwise using pmap returns won't complete
before
; (reduce) function kicks in
(fn [^java.io.File f] (. f (isDirectory)))
(fn [^java.io.File d] (seq (. d (listFiles))))
dir) ]
;(burn) ;; SLOW things down for testing
(println "Done: " dir (count total))
(count total) ;Returns a list with summed values
)
)
(defn -main [& args]
(def rootPath (nth args 0))
(println "Root: " rootPath)
(def f (clojure.java.io/file rootPath))
;(println (seq (. f (listFiles))))
(def subDirs (seq (. f (listFiles)))) ; store the first depth of folders
; and run them concurrently
(->> (pmap #(file-seqC %1) subDirs ,,,) ; returns list of total files in
each folder
(reduce + ,,,) ; sum them up
(println "Total Files:" ,,,) ; print
)
;(shutdown-agents) ;terminate JVM which linger in command line
)
(-main "/data/temp/kuba/aaa")
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.