On 03 Jan 2007, at 12:05, Stephane Eranian wrote:
Did you try other events? Also PIN only counts user level
instructions.
did you make sure your measurements were setup to count only at
user level?
I haven't tried other events yet, because those are a lot harder to
check. I've set the counters to count only user-level instructions
(setting it to kernel yields a much higher count).
I've been experimenting with this, and it seems the P4 counters just
aren't reporting the correct number of instructions. Some
instructions seem to be counted double, or even more.
The highest difference for the SPEC CPU2000 benchmarks is observed
for mesa:
PIN: 291,680,398,079
perfex @ Pentium4: 298,575,250,612
That's a difference of 6,8 * 10^9 instructions, or 2.36% of the total
execution, which is _huge_ . My needs, as I've explained before, only
allow a difference of 100,000 instructions, so this is a real
showstopper for me. Clearly, perfmon is not to blame here, nor is
perfctr. Something is just wrong with the hardware implementation the
way I see it...
As a note for future users of instr_retired on Intel Pentium 4 with
any tool (perfctr, perfex, PAPI, perfmon, ...): be very carefull with
the results you're getting, because it appears some instructions
cause multiple increases of the counters, which leads to misleading
results. The settings I'm using (with perfex) are:
perfex -e 0x00039000/[EMAIL PROTECTED] <benchmark>
for instr_completed (which yields a zero count for me):
perfex -e 0x00039000/[EMAIL PROTECTED] <benchmark>
Any additional comments on this are welcome, but I won't be losing
anymore time over this. The counts on AMD machines are looking a lot
better, so I'll just go with AMD.
Also have you tried in system-wide mode, just to verify that there
is nothing
wrong with the PMU context switch code.
Yep, that yields higher counts. Kernel-only yields the difference
between system-wide and user-level.
As for instr_completed, I have never been able to measure it
correctly.
There may be unpublished constraints on this event which libpfm
does not
know about.
I've tried using perfex, using the correct settings, but I'm getting
a zero count every time I try (on two different machines). No idea
what's causing this. There are some undocumented issues for sure....
greetings,
Kenneth
--
Statistics are like a bikini. What they reveal is suggestive, but
what they conceal is vital (Aaron Levenstein)
Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/