Hi,

2009/8/10 Tom Emerson <tremer...@gmail.com>

>
> Hello Clojurians,
>
> I want to process approximately 74K XML files that are stored on disk
> in a series of nested directories, each of which contains upto 1000
> files. For example,
>
> rootdir
>    0
>        file1.xml
>        file2.xml
>    1
>        file3.xml
>        file4.xml
>
> and so on.
>
> file-seq gives me a convenient way to get a seq of all these files.
> What I would like to do is process elements in this sequence in
> parallel. My first thought was to process the seq with pmap, but this
> is suboptimal because I'm not interested in saving the return value of
> function called on each file.
>
> Assuming I want bounded parallelism (such as pmap gives you, 2 +
> number of cores) how would you approach this problem in Clojure?
>
> Thanks in advance for your insights,
>

2 ideas :

1./  why not use pmap anyway, in combination with dorun (which will ensure
you have consumed the sequence, without retaining the head) ?

Ok, solution 1./ creates a lot of unininteresting seqs, so maybe
2./ use clojure.parallel/preduce ?

If your fn is
(defn process-file [file] ...)

(preduce (fn [_ file] (process-file file)) nil files-seq)

?


(*preduce* f base coll)



>
>    -tree
>
>
> --
> Tom Emerson
> tremer...@gmail.com
> http://treerex.blogspot.com/
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to