Re: [ccp4bb] phaser openmp

Pascal Wed, 09 Nov 2011 03:26:34 -0800

On 11/09/2011 11:53 AM, Francois Berenger wrote:

On 11/09/2011 07:21 PM, Pascal wrote:

I have more problems with L2 misse cache events and memory bandwidth. A
quad cores means 4 times the bandwidth necessary for a single process...
If your code is already a bit greedy, the scale up is not good.


I never went down to this level of optimization.
Are you using valgrind to detect cache miss events?


No, I am not sure valgrind can cope with multithread applications correctly.

In this particular case my code is running faster on a intelQ9400@2.67GHz with

800MHz DDR2 than an intel Q9505@2.83GHz with 667MHz DDR2. Also I have

a nice scale up on a 4*12 opteron cpu (each cpu has 2 dual channelmemory bus)but not on my standard quad core. If I get my hands on a i7-920 equippedwithtriple channel DDR3 the program should run much faster despite the samecpu clock.


Then I used perf[1] and oprofile[2] on linux.
Have a look here for the whole story:
<http://blog.debroglie.net/2011/10/25/cpu-starvation/>


After gprof, usually I am done with optimization.
I would prefer to change my algorithm and would be afraid
of introducing optimizations that are architecture-dependent
into my software.


When I spot a bottle neck, it's my first reaction, changing the algorithm.
Caching calculations, more efficient algorithms...

But once I had to do some manual loop tilling. It's kind of a change ofalgorithmas the size of a temporary variable change as well but the number ofoperationsremains the same. The code with the loop tilling is ~20% faster. Onlydue to a

better use of the cpu cache.
<http://blog.debroglie.net/2011/10/28/loop-tiling/>

[1] http://kernel.org/ package name should be perf-util or similar
[2] http://oprofile.sourceforge.net/

Pascal

Re: [ccp4bb] phaser openmp

Reply via email to