On 14/10/12 00:17, Ben Mabey wrote:
I switched from pmap to
(r/fold n (r/monoid into vector) conj coll))
and the same thing happened again!
after approximately 50 minutes cpu utilisation dropped from 4/4 to
1/4...I don't understand!
Jim
Are you holding on to the head of the collection? That is often the
source of memory leaks like this.
-Ben
Thanks for your replies guys...The problem turned out to be purely my
fault! I had forgotten the merged version of all those files in the same
directory!!! So there were 383 files with size of some kb and a single
one with size > 9MB...as you can probably imagine, the chunk that
included that big file would take a lot more to finish which causes the
performance degradation i was describing...As, Tassillo explained
yesterday, pmap is semi-lazy and chunked. Because I am consuming it from
within a doseq which is serial there is a serious chance for 'waiting'
when workload is not uniform... as soon as I make the workload
relatively uniform (delete the merged file) everything works like a
charm. This is indeed the 1st occasion where pmap is exactly what I
need. Now, regarding the reducers version, it took me a while to realise
why I was seeing the same behaviour but it is the exact same reason as
pmap. I was reducing using r/map but combining using 'into vector'...the
branch of the fork-join tree that had to process that big file and pour
it into a vector was holding up computation. I'm not quite sure why
work-stealing did not occur...maybe i had too large of a fold-chunk size
(50)...
In any case, I apologise for blaming pmap and for not having checked my
dataset before posting...thanks again for all your comments :)
Jim
ps: it may sound strange but there is a bonus in using pmap over
reducers in my case! even if the user kills the process before it
finishes (perhaps he's bored), with pmap some of the target-file will
have been written whereas with reducers the entire calculation has to
finish in order to even start spitting to a file!
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en