Re: abysmal multicore performance, especially on AMD processors

Andy Fingerhut Sun, 09 Dec 2012 09:52:41 -0800

On Dec 8, 2012, at 9:37 PM, Lee Spector wrote:

> 
> On Dec 8, 2012, at 10:19 PM, meteorfox wrote:
>> 
>> Now if you run vmstat 1 while running your benchmark you'll notice that the 
>> run queue will be most of the time at 8, meaning that 8 "processes" are 
>> waiting for CPU, and this is due to memory accesses (in this case, since 
>> this is not true for all applications).
>> 
>> So, I feel your benchmark may be artificial and does not truly represent 
>> your real application, make sure to profile your real application, and 
>> optimize according to the bottlenecks. There are really useful tools out 
>> there 
>> for profiling, such as VisualVM, perf.
> 
> Thanks Carlos.
> 
> I don't actually think that the benchmark is particularly artificial. It's 
> very difficult to profile my actual application because it uses random 
> numbers all over the place and is highly and nonlinearly variable in lots of 
> ways. But I think that the benchmark I'm running really is pretty 
> representative.
> 
> In any event, WHY would all of that waiting be happening? Logically, nothing 
> should have to be waiting for anything else. We know from the fact that we 
> get good speedups from multiple simultaneous JVM runs, each just running one 
> call to my burn function, that the hardware is capable of performing well 
> with multiple instances of this benchmark running concurrently. It MUST be 
> able to handle all of the memory allocation and everything else. It's just 
> when we try to launch them all in parallel from the same Clojure process, 
> using pmap OR agents OR reducers, that we fail to get the concurrency 
> speedups. Why should this be? And what can be done about it?
> 
> I know nearly nothing about the internals of the JVM, but is there perhaps a 
> bottleneck on memory allocation because there's a single serial allocator? 
> Perhaps when we run multiple JVMs each has its own allocator so we don't have 
> the bottleneck? If all of this makes sense and is true, then perhaps (wishful 
> thinking) there's a JVM option like "use parallel memory allocators" that 
> will fix it?!


Lee:

I don't know yet know how to get good speedups with this workload in the Oracle 
JVM, although it might be possible with options I'm unaware of.

Azul's Zing JVM has a different memory allocator and GC implementation that 
might be better tuned for parallel workloads.  I haven't used it myself yet -- 
this is just from hearing about it in the past and a brief scan of their web 
site.  They have free trials available.  Maybe you could try that to see if it 
gives you better results out of the box, or with minor tweaking of parameters?

I don't know the cost, but like many companies they might have significant 
discounts for educational customers.

Andy


-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: abysmal multicore performance, especially on AMD processors

Reply via email to