I'm not saying that I know this will help, but if you are open to trying a different JVM that has had a lot of work done on it to optimize it for high concurrency, Azul's Zing JVM may be worth a try, to see if it increases parallelism for a single Clojure instance in a single JVM, with lots of threads.
It costs $$, but I'm guessing they might have steep discounts for educational institutions. They have free trials, too. http://www.azulsystems.com/products/zing/whatisit Andy On Dec 13, 2012, at 10:41 AM, Wm. Josiah Erikson wrote: > OK, I did something a little bit different, but I think it proves the same > thing we were shooting for. > > On a 48-way 4 x Opteron 6168 with 32GB of RAM. This is Tom's "Bowling" > benchmark: > > 1: multithreaded. Average of 10 runs: 14:00.9 > 2. singlethreaded. Average of 10 runs: 23:35.3 > 3. singlethreaded, 8 simultaneous copies. Average run time of said > concurrently running copies: 23:31.5 > > So we see a speedup of less than 2x running multithreaded in a single JVM > instance. By contrast, running 8 simultaneous copies in 8 separate JVM's > gives us a perfect 8 x speedup over running a single instance of the same > singlethreaded benchmark. This proves pretty conclusively that it's not a > hardware limitation, it seems to me.... unless the problem is that it's > trying to spawn 48 threads, and that creates contention. > > I don't think so though, because on an 8-way FX-8120 with 16GB of RAM, we see > a very similar lack of speedup going from singlethreaded to multithreaded > (and it will only be trying to use 8 threads, right?), and then we see a much > better speedup (around 4x - we're doing 8 times the work in twice the amount > of time) going to 8 concurrent copies of the same thing in separate JVM's > (even though I had to decrease RAM usage on the 8 concurrent copies to avoid > swapping, thereby possibly slowing this down a bit): > 1. 9:00.6 > 2. 14:15.6 > 3. 27:35.1 > > We're probably getting a better speedup with the concurrent copies on the > 48-way node because of higher memory bandwidth, bigger caches (and more of > them), and more memory. > > Does this help? Should I do something else as well? I'm curious to try > running like, say 16 concurrent copies on the 48-way node.... > > > -Josiah > > On Wed, Dec 12, 2012 at 10:03 AM, Andy Fingerhut <andy.finger...@gmail.com> > wrote: > Lee: > > I believe you said that with your benchmarking code achieved good speedup > when run as separate JVMs that were each running a single thread, even before > making the changes to the implementation of reverse found by Marshall. I > confirmed that on my own machine as well. > > Have you tried running your real application in a single thread in a JVM, and > then run multiple JVMs in parallel, to see if there is any speedup? If so, > that would again help determine whether it is multiple threads in a single > JVM causing the slowdown, or something to do with the hardware or OS that is > the limiting factor. > > Andy > > > On Dec 11, 2012, at 4:37 PM, Lee Spector wrote: > > > > > On Dec 11, 2012, at 1:06 PM, Marshall Bockrath-Vandegrift wrote: > >> So I think if you replace your calls to `reverse` and any `conj` loops > >> you have in your own code, you should see a perfectly reasonable > >> speedup. > > > > Tantalizing, but on investigation I see that our real application actually > > does very little explicitly with reverse or conj, and I don't actually > > think that we're getting reasonable speedups (which is what led me to try > > that benchmark). So while I'm not sure of the source of the problem in our > > application I think there can be a problem even if one avoids direct calls > > to reverse and conj. Andy's recent tests also seem to confirm this. > > > > BTW benchmarking our real application (https://github.com/lspector/Clojush) > > is a bit tricky because it's riddled with random number generator calls > > that can have big effects, but we're going to look into working around > > that. Recent postings re: seedable RNGs may help, although changing all of > > the RNG code may be a little involved because we use thread-local RNGs (to > > avoid contention and get good multicore speedups... we thought!). > > > > -Lee > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en