On 14/10/12 00:17, Ben Mabey wrote:

I switched from pmap to

(r/fold n (r/monoid into vector) conj coll))

and the same thing happened again!
after approximately 50 minutes cpu utilisation dropped from 4/4 to 1/4...I don't understand!

Jim


Are you holding on to the head of the collection? That is often the source of memory leaks like this.

-Ben


Thanks for your replies guys...The problem turned out to be purely my fault! I had forgotten the merged version of all those files in the same directory!!! So there were 383 files with size of some kb and a single one with size > 9MB...as you can probably imagine, the chunk that included that big file would take a lot more to finish which causes the performance degradation i was describing...As, Tassillo explained yesterday, pmap is semi-lazy and chunked. Because I am consuming it from within a doseq which is serial there is a serious chance for 'waiting' when workload is not uniform... as soon as I make the workload relatively uniform (delete the merged file) everything works like a charm. This is indeed the 1st occasion where pmap is exactly what I need. Now, regarding the reducers version, it took me a while to realise why I was seeing the same behaviour but it is the exact same reason as pmap. I was reducing using r/map but combining using 'into vector'...the branch of the fork-join tree that had to process that big file and pour it into a vector was holding up computation. I'm not quite sure why work-stealing did not occur...maybe i had too large of a fold-chunk size (50)...

In any case, I apologise for blaming pmap and for not having checked my dataset before posting...thanks again for all your comments :)

Jim

ps: it may sound strange but there is a bonus in using pmap over reducers in my case! even if the user kills the process before it finishes (perhaps he's bored), with pmap some of the target-file will have been written whereas with reducers the entire calculation has to finish in order to even start spitting to a file!


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to