Re: [perfmon] How to measure all of the L2 DATA MISSES

Stéphane Zuckerman Wed, 09 May 2007 07:03:43 -0700

Stephane Eranian a écrit :

Stephane,
On Wed, May 09, 2007 at 11:34:59AM +0200, St?phane Zuckerman wrote:
We're trying to measure L2D MISSES on a Xeon Woodcrest (a dual cpu, dualcore machine).
We've tried different hardware counters, namely :
- LAST_LEVEL_CACHE_MISSES, which, the documentation says, is equivalentto L2_RQSTS:I_STATE (invalid cachelines), but doesn't count the hardwareprefetches
- L2_RQSTS:MESI, and various combinations between M/E/S/I options, aswell as PREFETCH combined with the mask SELF.
We're trying to measure accurately the L2 data misses, but we're gettinginconsistent results. How must we proceed ?
If you are using pfmon, please provide exact command line.

We haven't inserted probes directly in the code yet. We do use pfmon,indeed.


Here's what we've done :
pfmon -e LAST_LEVEL_CACHE_MISSES ./program
pfmon -e L2_RQSTS:I_STATE:ANY
pfmon -e L2_RQSTS:PREFETCH (which returns 0)

We also tried L2_RQSTS:MESI, and each flag separately.

Also you
need to define what you mean exactly by inconsistent. That are always
some fluctuations from one run to the next.

Of course, we didn't expect that the results would always be the same.We're trying to make a correlation between cache accesses/misses and TLBaccesses/misses on this machine.


Here's the code we're trying to monitor :

double cache_loop(double * tmp, unsigned long size){
  double acc;
  unsigned long i;
  for(i = 0; i < size; i++){
    acc += tmp[i];
  }
  return acc;
}

Here, the array tmp is allocated either thanks to mmap or to malloc(depending on the tests we wanted to perform). Before calling thisfunction, we initialize the array so as to force page allocation, thenwe run a first loop around this function to warm up the cache.

The real test comes after, where we wrap a loop around the function call  .

Basically, we have :

<code>
/* size is a variable parameter we set when calling our program
 * its value varies from "in-cache" to "out-of-cache"
 */
for (i=0; i < size; i++)
        tmp[i] = 1;
for (i=0; i < 10; i++)
        res = cache_loop(tmp,size);

/* some more code which isn't relevant here */

for (i=0; i < ITERATIONS; i++)
        res = cache_loop(tmp,size);
</code>

We would like to monitor the L2 cache misses (hence the call toLAST_LEVEL_CACHE_MISSES with pfmon).


As for what I deem "inconsistent":

We have 4 MB of L2 cache on this computer for each processor, and whenmonitoring the whole program with pfmon (no--trigger-code-{start|stop}-address used) we get a few order ofmagnitude less cache misses than we predicted (according to the size ofthe data we allocated).

We know that we have to monitor the hardware prefetches with a differentrun, but even then, the figures don't match our estimations.

We suspect that the out-of-order mechanism can temper with ourexpectations, but it seems highly unlikely that it can change ourresults that much.

We ran out of ideas to explain this, hence the question about how tomeasure cache misses accurately (with pfmon for a start, but evenin-code probes would do, of course).


--
Stéphane Zuckerman
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Re: [perfmon] How to measure all of the L2 DATA MISSES

Reply via email to