Hello, I have just pushed into GIT an important bug fix that affected all architectures when doing timeout-based set switching on a per-thread basis.
You could easily reproduce the problem by using the multiplex2 example of libpfm: $ ./multiplex2 --us-c --freq=1000 ../../pfmon/tests/noploop 10 clock_res=1000000ns(1000.00Hz) ask period=1000000ns(1000.00Hz) get period=1000000ns(1000.00Hz) noploop for 10 seconds # 1000.00Hz period = 1000000nsecs # 1666000 cycles @ 1666 MHz # using time-based multiplexing # 1000000 nsecs effective switch timeout # 2 sets # 1173.50 average run per set # 5000547811.50 average ns per set # set measured total #runs scaled total event name # ------------------------------------------------------------------ 000 14,602,410,677 1174 16,602,727,292 CPU_OP_CYCLES_ALL 001 1,498,323,507 1173 12,436,159,551 IA64_INST_RETIRED In this test, the program runs for 10s at a switch frequency of 1000Hz with 2 event sets. Thus, we expect each set to have run about 5000 times. This is not what is happening, yet the scaled counts are correct. This comes from the fact that multiplex2 does duration-based scaling. The number of times each set ran is lower than expected but when they ran, they ran for longer. You have to be careful with multiplex2 and your underlying hardware because you may also be limited by the clock resolution (like the example above). Here we are operating at the limit of the granularity for instance. $ ./multiplex2 --us-c --freq=1000 ../../pfmon/tests/noploop 10 clock_res=1000000ns(1000.00Hz) ask period=1000000ns(1000.00Hz) get period=1000000ns(1000.00Hz) noploop for 10 seconds # 1000.00Hz period = 1000000nsecs # 1666000 cycles @ 1666 MHz # using time-based multiplexing # 1000000 nsecs effective switch timeout # 2 sets # 5000.50 average run per set # 5000255597.00 average ns per set # set measured total #runs scaled total event name # ------------------------------------------------------------------ 000 8,293,212,752 5001 16,585,052,664 CPU_OP_CYCLES_ALL 001 24,876,546,492 5000 49,757,211,675 IA64_INST_RETIRED On recent X86 hardware compiled with the right timer options, you should get 5000 iterations for each set using the multiplex2 command line options above. Here are the technical details of what happens: There are certain situations where the timeout expires while interrupts are masked. Thus the timeout is kept pending. The issue is that during context switch out, we saved the remaining value of the timeout. We reinstall the value during context switch in. The problem is that is the timeout has expired the remaining timeout value is negative. The bug was that when passed a negative value, hrtimer_start() will basically set the timeout to the largest value possible, i.e., we will never get a new timeout and set switching is stopped. During context switch in, the fix now checks if the remaining timeout is negative and if so triggers set switching whih will reinstall a new timeout value. Because this type of timeout checking is needed also during umasking of monitoring, it is implemented by pfm_restart_timer(). Expired timeouts are accounted for by the set_switch_exp statistics under debugfs. Note that for tools which were using duration-based scaling the scaled value looked fine. But clearly the number of activations of each set was wrong. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ perfmon2-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
