Hi all,

Greetings from me!

I want to use "perf stat" command to do a statistic of
"MEM_UOP_RETIRED.ALL_LOADS" and "MEM_UOP_RETIRED.ALL_STORES" of the
program:

#include <stdio.h>

#define MAX_ARRAY_SIZE  (10000000)

double a[MAX_ARRAY_SIZE];
double b[MAX_ARRAY_SIZE];
double c[MAX_ARRAY_SIZE];
int main(void)
{

    ssize_t i = 0;

    for (i = 0; i < MAX_ARRAY_SIZE; i++)
    {
       c[i] = a[i] + b[i];
    }
   return 0;
}

The disassemble code of the program (no Optimization):

Dump of assembler code for function main:
  0x00000000004004f0 <+0>:     push   %rbp
  0x00000000004004f1 <+1>:     mov    %rsp,%rbp
  0x00000000004004f4 <+4>:     movq   $0x0,-0x8(%rbp)
  0x00000000004004fc <+12>:    movq   $0x0,-0x8(%rbp)
  0x0000000000400504 <+20>:    jmp    0x400536 <main+70>
  0x0000000000400506 <+22>:    mov    -0x8(%rbp),%rax
  0x000000000040050a <+26>:    movsd  0x9e97860(,%rax,8),%xmm1
  0x0000000000400513 <+35>:    mov    -0x8(%rbp),%rax
  0x0000000000400517 <+39>:    movsd  0x601060(,%rax,8),%xmm0
  0x0000000000400520 <+48>:    addsd  %xmm1,%xmm0
  0x0000000000400524 <+52>:    mov    -0x8(%rbp),%rax
  0x0000000000400528 <+56>:    movsd  %xmm0,0x524c460(,%rax,8)
  0x0000000000400531 <+65>:    addq   $0x1,-0x8(%rbp)
  0x0000000000400536 <+70>:    cmpq   $0x98967f,-0x8(%rbp)
  0x000000000040053e <+78>:    jle    0x400506 <main+22>
  0x0000000000400540 <+80>:    mov    $0x0,%eax
  0x0000000000400545 <+85>:    pop    %rbp
  0x0000000000400546 <+86>:    retq
End of assembler dump.

The loop statement code is:
  0x0000000000400506 <+22>:    mov    -0x8(%rbp),%rax
  0x000000000040050a <+26>:    movsd  0x9e97860(,%rax,8),%xmm1
  0x0000000000400513 <+35>:    mov    -0x8(%rbp),%rax
  0x0000000000400517 <+39>:    movsd  0x601060(,%rax,8),%xmm0
  0x0000000000400520 <+48>:    addsd  %xmm1,%xmm0
  0x0000000000400524 <+52>:    mov    -0x8(%rbp),%rax
  0x0000000000400528 <+56>:    movsd  %xmm0,0x524c460(,%rax,8)
  0x0000000000400531 <+65>:    addq   $0x1,-0x8(%rbp)
  0x0000000000400536 <+70>:    cmpq   $0x98967f,-0x8(%rbp)

The number of load operations is 7:
  0x0000000000400506 <+22>:    mov    -0x8(%rbp),%rax
  0x000000000040050a <+26>:    movsd  0x9e97860(,%rax,8),%xmm1
  0x0000000000400513 <+35>:    mov    -0x8(%rbp),%rax
  0x0000000000400517 <+39>:    movsd  0x601060(,%rax,8),%xmm0
  0x0000000000400524 <+52>:    mov    -0x8(%rbp),%rax
  0x0000000000400531 <+65>:    addq   $0x1,-0x8(%rbp)
  0x0000000000400536 <+70>:    cmpq   $0x98967f,-0x8(%rbp)

The number of store operations is 2:
  0x0000000000400528 <+56>:    movsd  %xmm0,0x524c460(,%rax,8)
  0x0000000000400531 <+65>:    addq   $0x1,-0x8(%rbp)

So the number of total load operations is about 70,000,000, and the
number of total store operations is about 20,000,000.

After executing "perf stat" command, the output is:

[root@Linux test]# perf stat -a -e "r81d0","r82d0" ./a

Performance counter stats for './a':

        71,779,954 r81d0
         [100.00%]
        26,601,675 r82d0

       0.036929715 seconds time elapsed

I can see there are about 1,779,954(71,779,954 - 70,000,000) more load
operations and 6,601,675(26,601,675 - 20,000,000) more store
operations.

I can't understand why there are so many more load/store operations
from "perf stat" output. Could anyone can explain this? Thanks very
much in advance!

-- 
Best Regards
Nan Xiao
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to