Hi,

On 04/01/2014 07:58 PM, [email protected] wrote:
> Hi!
>>>> Make performance_counter01 run default. This performance_counter01 case
>>>> is to use perf_event_open() to test hardware CPU events.
>>>>
>>>> Do not let performance_counter02 run defalult. This test need to know
>>>> an upper bound on the number of hardware counters, which varies in
>>>> different platforms. So if knowing this value, user can execute:
>>>> ./performance_counter02 -C (upper bound) to run this test.
>>> The test seems to default to 8. Can we just make it return TCONF if
>>> there are less than 8 hw counters? Having tests that are not executed by
>>> default is pointless.
>> Here I actually do not know how to get the number of hardware counters.
>> I had googled this issue, but do not find any method to get the number.
> Hmm. I've looked at the interface and I wonder if we can do something as
> setting on the exclusive bit and create HW perf counters untill the
> perf_event_open() starts to fail, but that is just guess, the perf
> interface looks quite complicated.
According to your suggestions, I re-read the perf_event_open()'s manpage
carefully. It seems that the exclusive bit does not work for this condition, 
but I
find another similar method, thanks.

Accordind to perf_event_open()'s manpage, once a perf_event_open() fd
has been opened, the values of the events can be read from the file descriptor.
The values are specified by the read_format field in the perf_event_attr 
structure
at open time.

Here is the layout of the data returned by a read:
If PERF_FORMAT_GROUP was not specified:
    struct read_format {
        u64 value;         /* The value of the event */
        u64 time_enabled;  /* if PERF_FORMAT_TOTAL_TIME_ENABLED */
        u64 time_running;  /* if PERF_FORMAT_TOTAL_TIME_RUNNING */
        u64 id;            /* if PERF_FORMAT_ID */
    };
 
Normally time_enabled and time_running are the same value. But if more events 
are
started than available counter slots on the PMU, then multiplexing happens and
events run only part of the time. Time_enabled and time_running's values will be
different. In this case the time_enabled and time_running values can be used to
scale an estimated value for the count. So if time_enabled and time_running are
not equal, we can think that PMU hardware counters multiplexing happens and the
number of the opened events are the number of hardware counters.

I have made new version patches according to the above methods and it works
correctly on my owned cpus at hand.

In fedora19, cpu:Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
[root@localhost perf_event_open]# ./perf_event_open02 -v
at iteration:0 value:3001149816 time_enabled:290650642 time_running:290650642
at iteration:1 value:3001085257 time_enabled:289461694 time_running:289461694
at iteration:2 value:3001120364 time_enabled:290046564 time_running:290046564
at iteration:3 value:3001079677 time_enabled:289030967 time_running:289030967
at iteration:4 value:3001110238 time_enabled:290997669 time_running:290997669
at iteration:5 value:3001108683 time_enabled:290591956 time_running:290591956
at iteration:6 value:3001095100 time_enabled:290233135 time_running:290233135
at iteration:7 value:3001159732 time_enabled:290614169 time_running:290614169
at iteration:8 value:3001238794 time_enabled:289356537 time_running:289356537
at iteration:9 value:2702293692 time_enabled:291629388 time_running:262500119
performance_counter02    0  TINFO  :  overall task clock: 291034492
performance_counter02    0  TINFO  :  hw sum: 27011319099, task clock sum: 
2614888583
hw counters: 2050683984 2051786132 2063378052 2073442320 2085860676 2096033727 
2095478983 2096008741 2094299936 2092623511 2081426503 2069528271 2060768263
task clock counters: 198657095 198765474 199761126 200755015 201770571 
202653016 202681469 202685565 202683611 202649562 201596718 200622002 199607359
performance_counter02    0  TINFO  :  ratio: 8.98
performance_counter02    1  TPASS  :  test passed

at iteration:9, time_enabled starts to be not equal to time_running, so the max 
available hardware counters is 9.

In RHEL7.0Beta, cpu:Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz
[root@RHEL7U0Beta_Intel64 perf_event_open]# ./perf_event_open02 -v
at iteration:0 value:3002867150 time_enabled:720328375 time_running:720328375
at iteration:1 value:3002867170 time_enabled:720385634 time_running:720385634
at iteration:2 value:3002878188 time_enabled:720613777 time_running:720613777
at iteration:3 value:3002900180 time_enabled:720716067 time_running:720716067
at iteration:4 value:3002870030 time_enabled:720483705 time_running:720483705
at iteration:5 value:2500716298 time_enabled:726902712 time_running:605068696
performance_counter02    0  TINFO  :  overall task clock: 727979051
performance_counter02    0  TINFO  :  hw sum: 15021298449, task clock sum: 
3634933571
hw counters: 1666785329 1667201792 1667756298 1667230060 1667869031 1671386808 
1671002443 1671412588 1670654100
task clock counters: 403395982 403445275 403480182 403432969 403462208 
404429300 404433289 404494620 404359746
performance_counter02    0  TINFO  :  ratio: 4.99
performance_counter02    1  TPASS  :  test passed

at at iteration:9, time_enabled starts to be not equal to time_running, so the 
max available hardware counters is 5.


In RHEL6.5GA, cpu: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
[root@RHEL6U5GA_Intel64 perf_event_open]# ./perf_event_open02 -v
at iteration:0 value:3003591514 time_enabled:346688452 time_running:346688452
at iteration:1 value:3003732715 time_enabled:343180347 time_running:343180347
at iteration:2 value:3003590550 time_enabled:343026521 time_running:343026521
at iteration:3 value:2251480185 time_enabled:344135674 time_running:257996318
performance_counter02    0  TINFO  :  overall task clock: 344705415
performance_counter02    0  TINFO  :  hw sum: 9011081784, task clock sum: 
1032592768
hw counters: 1283274474 1291871336 1296978362 1293770018 1285155487 1279981744 
1280050363
task clock counters: 147043808 148035072 148619536 148262287 147269876 
146680943 146681246
performance_counter02    0  TINFO  :  ratio: 3.00
performance_counter02    1  TPASS  :  test passed

at at iteration:3, time_enabled starts to be not equal to time_running, so the 
max available hardware counters is 3.

Here I use the term "max available hardware counters". Because this number is 
less than
the output of papi_avail. Please see this url: 
http://stackoverflow.com/questions/21179813/find-out-how-many-hardware-performance-counters-a-cpu-has

In RHEL7.0 Beta, the output of papi_avail is below:
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 5.3.0.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz (30)
CPU Revision             : 5.000000
CPUID Info               : Family: 6  Model: 30  Stepping: 5
CPU Max Megahertz        : 2792
CPU Min Megahertz        : 2792
Hdw Threads per core     : 1
Cores per Socket         : 4
Sockets                  : 1
NUMA Nodes               : 1
CPUs per Node            : 4
Total CPUs               : 4
Running in a VM          : no
Number Hardware Counters : 7
Max Multiplex Counters   : 64

The Number Hardware Counters is 7, But according to above ./perf_event_open02 
in RHEL7.0beta,  the output is 5.
Whether two slots in PMU has been used, or the output of papi_avail is not 
correct :-) .
I tried to figure out how papi_avail gets the number of hardware counters, It 
seems that it just predefines some const values for
specific cpus, but I am not very sure, I do not have much time to read the 
papi_library source code.

And I afraid that this library does not support enough kinds of cpus. I also 
met some compilation errors in Fedora19 about this library.

So whether we can use the first method I mentioned. I think it is more 
reasonable, at least according to perf_even_open()'s manpage
and the intent of perf_event_open02 case.


Regards,
Xiaoguang Wang
>

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Ltp-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ltp-list

Reply via email to