I haven't seen any updates on the "lost events" thread in a while. Losing a thread about lost events would be especially ironic.
I did have a conversation with John Mellor-Crummey around 2008-03-10 about this. I reminded him that several years ago (when the 2.6.9 kernel was brand new) we first saw this behavior at Rice on an Opteron box (2 sockets, single core) running 2.6.7, the corresponding versions of perfctr, PAPI, and hpcrun. It turned out that there was window of vulnerability in context switches when switching from a monitored process to an unmonitored one. The driver got an event in the new process context, found no virtualized counters, dropped the event on the floor, and returned without reinitializing the counter for the next event. Diagnosing this was easy once I looked in the kernel message log, because perfctr was civilized enough to leave descriptive messages, but it was otherwise silent about quitting, so it required me to go snooping as root on this box to find the messages. Fixing this was easy (on me) -- I just sent an email to Mikael Pettersson, gave him an account (with root) on the machine, and sat back. Within 2 days he'd upgraded the box to 2.6.9, fixed perfctr, and propagated the patch to the world. I'm speculating that there's a similar condition in Perfmon 2 in which the handler returns without enabling the next event. Note that the original Perfmon on IA64 from that era was absolutely rock solid, even when we abused it by having it deliver events at very high rates. [EMAIL PROTECTED] wrote: > I have a customer who has an application that when run under pfmon reports > 154 billion CPU_CYCLES used (appears to be a reasonable value). When this > same application is run under Hpcrun (from HPCToolkit using PAPI) it only > reports about 2 billion CPU_CYCLES used. These tests are run on an Intel > IA64 platform. > <stuff elided here> > > I have also browsed this mailing list and found a thread called > "papi on compute node linux" which was last updated 2008-03-10. The > discussion in this thread sounds to me like it could easily explain what > I am seeing. > > Is there a way I can determine if this discussion (ie: loosing interrupts) > is what I am seeing ? > > Thanks for any help you can provide. > > Gary > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > perfmon2-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel -- Robert J. Fowler Chief Domain Scientist, HPC Renaissance Computing Institute The University of North Carolina at Chapel Hill 100 Europa Dr, Suite 540 Chapel Hill, NC 27517 V: 919.445.9670 F: 919 445.9669 [EMAIL PROTECTED] ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ perfmon2-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
