It is hard to say where the root of your problem lies without looking at the code more. I would look closely at laziness. I find that lazy evaluation really kills parallelization.
On Friday, November 8, 2013 4:42:11 PM UTC-5, Jose M. Perez Sanchez wrote: > > Hello everyone: > > This is my first post here. I'm a researcher writing a numerical > simulation software in Clojure. Actually, I'm porting an app a coworker and > I wrote in C/Python (called GEMA) to Clojure: The app has been in use for a > while at our group, but became very difficult to maintain due to outgrowing > its initial design and being very monolithic and at the same time I wanted > to learn Functional Programming, so I've been working in the port for a few > weeks. > > The simulations are embarrassingly parallel Random Walk calculations used > to study gas diffusion and Helium-3 Magnetic Resonance diffusion > measurements in the lungs. At the core of the simulations we do there is a > 3D geometrical model of the pulmonary acinus. The new application is > designed in a modular fashion, I'm including part of the current README > file with :browse confirm wa > a description. > > I've approached my institution's Technology Transfer Office to request > authorization to release the software under an Open Source license, and if > everything goes well the code will be published soon. I'm very happy in my > Clojure trip so far and all the things I'm learning in the process. > > One of the things I've observed is poor scaling with the number of threads > for more than 4 threads in an 8-core Intel i7 CPU, as follows: > > NT Time cpu%x8 > 1 101.9 108 > 2 54.9 220 > 4 36.0 430 > 6 33.9 570 > 8 32.5 700 > 10 32.5 720 > > Computing times reported are just the time spent in the computation of the > NT futures (not total program execution time). CPU x8 percent is measured > with "top" in Linux and the % values are approximate, just to give an idea. > I'm running on Debian Wheezy with the following Java platform: > > JRE: OpenJDK Runtime Environment 1.6.0_27-b27 on Linux 3.2.0-4-amd64 > (amd64) > JVM: OpenJDK 64-Bit Server VM (build 20.0-b12 mixed mode) > > I'll try in a 16 core (4-way Opteron) soon and see what happens there. The > computing happens over an infinite lazy sequence of random walk steps > generated with "(iterate move particle)", when an "extraction function" > gets values from zero to the highest number of random walk steps and adds > (conj) the values to be kept to a vector. The resulting vector for each > particle is then added (conj) to a global vector for latter storage. > > I've read the previous post about concurrent performance in AMD > processors: > https://groups.google.com/forum/#!topic/clojure/48W2eff3caU%5B1-25-false%5D. > Have to do it again with more time though, to check whether any of the > explanations presented there applies to my application. > > Best regards, > > Jose Manuel. > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.