On Thu, 30 Apr 2015, Patrick wrote:
> I think I may have figured out what is going on by looking at the assembly
> (shown below). It looks like it is doing two stores per loop iteration - one
> to
> set the array value and one to update the index into the array.
Yes, that's because you are compiling without optimization.
> I am new to perf and seeing a lot of results from counters of which I'm not
> sure how to make sense. So I'm still trying to get comfortable and make sure I
> know what's going on. I have a follow-on question regarding possible ways to
> count memory writes on a processor that's not a Xeon.
good luck sorting things out, perf counter results have many issues that
make them hard to interpret, especially with advanced processors and
especially if cache is involved.
Just be glad you aren't running your test on an AMD model 14h machine as
seen below.
perf stat -e L1-dcache-stores:u ./copy 1048576
Performance counter stats for './copy_test 1048576':
35 L1-dcache-stores
0.014573254 seconds time elapsed
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html