Stephane Thanks very much for the explanation.
Just the other day I was wondering how the code in perfmon suspended the monitored process while the monitoring was suspended (masked). Now that I find out it allows the monitored process to continue executing what is happening becomes much more clear. In the first work around you suggested when you say "make the sampling buffer much bigger" I assume you are referring to the sampling period which is the counter that controls how often the overflows occur. If this is the case I have experimented with this value (have tried 1 million and 1 billion) but it does not seem to change the results very much. I will use pfmon with sampling and with sampling using process blocking to verify the results. I have looked at the hpcrun help and it does not seem to provide the ability to block the process when an overflow occurs. I may need to implement an option like this in hpcrun to resolve this problem for our customer. If pfmon with sampling behaves as you suggest (and my bets say it will) then I think we have two choices. We can either wait for your double buffered version of perfmon or enhance the hpcrun to support the ability to block the process while monitoring is suspended. I know that I hate to make estimates as to when I will have something done so I will understand if you do not want to go there but, any idea when the double buffered perfmon may be available ? Gary "stephane eranian" <[EMAIL PROTECTED]> wrote on 04/15/2008 09:59:10 AM: > Gary, > > On Tue, Apr 15, 2008 at 4:26 PM, <[EMAIL PROTECTED]> wrote: > > Stephane > > > > Well you guessed right on both counts. The /proc/perfmon shows the version > > is 2.0. > > > > The pfmon command looks like this: > > pfmon --debug -v -e CPU_CYCLES ./code.exe >pfmon 2>pfmon.debug > > > > The hpcrun command looks like this: > > hpcrun -e CPU_CYCLES:32767 -o hpcrun.data -- ./code.exe >hpcrun > > 2>hpcrun.debug > > > > So in both cases I run it as a tool that monitors another process(does not > > use self monitoring). > > > > I fail to see how this can account for the differences but am very > > interested in the explanation. > > > I suspect that if you use pfmon in sampling mode as well, you will > see the same discrepancy: > > pfmon -ecpu_cycles --long-smpl-periods=32767 > --smpl-outfile=pfmon.data ./code.exe > > > The reason for the big difference is that there exists a blind spot > with sampling. When > the sampling buffer fills up, monitoring is stopped BUT the monitored > process keeps > on running by default. So you are actually missing parts of the > execution. This is a well > known issue with sampling buffers. The current default sampling buffer > format used by > perfmon is very simple, too simple actually. What you need is a format > that implements > a double-buffer. I have released a simple implementation of this as a > proof-of-concept. > It is not in the main GIT tree yet. > > In the meantime you have 2 workarounds possible: > - make the sampling buffer much bigger. You'd have to look at the > hpcrun options > maybe they offer a way for you to grow the buffer. In pfmon you > can experiment > with this using the --smpl-entries options. If you get an error > message, check > your resource limits with ulimit and try increasing the locked > memory (ulimit -l unlimited). > > - have perfmon blocked the monitored process when the sampling > buffer fills up. This can > be accomplished with pfmon using the --overflow-block option. > Don't know if hpcrun has > an option for this. Careful, though, as this option is known to > have issues with processes > using signals internally. > > Hope this clarifies the issue you are seeing. > > > > > As far as I know, this particular customer application is the only one we > > have found that produces > > inconsistent results. All other executables that I have run these tools > > against seem to produce > > counts for CPU_CYCLES that are very close. > > > > Please tell me more. > > > > > > [EMAIL PROTECTED] wrote on 04/12/2008 12:05:36 > > AM: > > > > > > > > > Gary, > > > > > > I suspect you are running the stock perfmon as shipped with 2.6.18, > > > i.e., v2.0. > > > You can find out in /proc/perfmon. > > > > > > I would need the cmdline options used for pfmon. > > > > > > As for HPCRUN, I would need to know how this is run. In particular > > > whether this is a self-monitoring run or just like pfmon, a tool > > > monitoring another thread. > > > I suspect the latter which could explain the differences you are seeing. > > > > > > On Fri, Apr 11, 2008 at 8:04 PM, <[EMAIL PROTECTED]> wrote: > > > > Stephane > > > > > > > > Our system is running: > > > > > > > > MODEL ia64 [type=ia64] > > > > CPU 8 x Itanium 2, 64 bits 1600.000442 Mhz > > > > MEM 8219456 kB real memory > > > > OS Bull Linux Advanced Server release 4 (V5) - kernel > > 2.6.18-B64k.1.7 > > > > > > > > This kernel is based on the 2.6.18 kernel but has Bull specific > > patches > > > > included in it. > > > > > > > > Since perfmon is included in the kernel I do not know how to find its > > > > version. I would > > > > expect that we are running the one that comes with the 2.6.18 kernel. > > If > > > > you can tell me > > > > how to find a version for perfmon I will get it for you. In addition > > if > > > > you can provide me > > > > with a list of the modules that make up perfmon, I can checkto see if > > Bull > > > > has made > > > > any patches to those modules. I know that we have not yet installed > > the > > > > perfmon2 > > > > kernel patches. This is on our list to try but has not beendone yet. > > > > > > > > The value of 154 billion CPU_CYCLES is the approximate value reported > > by > > > > PFMON in its stdout. > > > > > > > > The value of 2 billion is the approximate result when I multiply the > > total > > > > number of samples reported by > > > > HPCPROF (about 68000) times the sampling period used in the HPCRUN > > (32767). > > > > As a point of interest > > > > the contents of /proc/interrupts also shows about 68000 perfmon > > interrupts > > > > occur during the HPCRUN. > > > > > > > > I will send the kernel debug data for both the PFMON and HPCRUN tests > > to > > > > your googlemail account > > > > in a separate email. > > > > > > > > At this point if you can just point me in the right direction and > > suggest > > > > some things to look for I will be > > > > a happy camper. > > > > > > > > Thanks > > > > > > > > > > > > Gary > > > > > > > > > > > > "stephane eranian" <[EMAIL PROTECTED]> wrote on 04/10/2008 > > 12:23:22 > > > > PM: > > > > > > > > > > > > > > > > > Gary, > > > > > > > > > > On Wed, Apr 9, 2008 at 1:18 AM, <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > I have a customer who has an application that when run under > > pfmon > > > > reports > > > > > > 154 billion CPU_CYCLES used (appears to be a reasonable value). > > When > > > > this > > > > > > same application is run under Hpcrun (from HPCToolkit using PAPI) > > it > > > > only > > > > > > reports about 2 billion CPU_CYCLES used. These tests are run on > > an > > > > Intel > > > > > > IA64 platform. > > > > > > > > > > > You need to tell me which kernel version, which perfmon version. > > > > > > > > > > Also how did you calculate those 2 numbers? What this simlpe > > counting and > > > > > derived from the samples you are getting. > > > > > > > > > > The 'losing interrupts' should not affect you because it is related > > > > > to handling > > > > > of signals in multi-threaded programs. > > > > > > > > > > > > > > > As for the log mail them to me directly. > > > > > > > > > > Thanks. > > > > > > > > > > > This application runs as a single thread and does not set a > > signal > > > > handler > > > > > > or mask the SIGIO signal. Hpcrun produces 8 data output files > > when run > > > > on > > > > > > this application. One for the application itself, 4 for bash > > scripts > > > > the > > > > > > application runs, 2 for 'rm' commands the application executes > > and 1 > > > > for a > > > > > > gzip command it runs. > > > > > > > > > > > > The customer wants to know why Hpcrun only reports a little over > > 1% of > > > > the > > > > > > cpu > > > > > > cycles used. I have been trying to compare what pfmon does to > > what > > > > hpcrun > > > > > > does > > > > > > and it seems that the only debug data available for both runs is > > the > > > > kernel > > > > > > debug > > > > > > data written by perfmon. This data clearly shows that > > Hpcrun/Papi is > > > > using > > > > > > the perfmon services differently than pfmon does. I tried to > > attach > > > > the > > > > > > debug output for these two runs to this mail but that exceeded > > the > > > > allowed > > > > > > message > > > > > > size for the list. > > > > > > > > > > > > I tried adding code (as a test case) to the Papi signal handler > > to > > > > count > > > > > > and print > > > > > > the number of signals paid during the run. The values printed > > seemed > > > > to > > > > > > pretty > > > > > > much match the values reported as number of samples when hpcprof > > is > > > > run on > > > > > > the > > > > > > hpcrun data files. This was an attempt to detect if my problem > > was > > > > > > handling signals > > > > > > or getting them and I think this test showed the problem is in > > getting > > > > > > them. > > > > > > > > > > > > I have also browsed this mailing list and found a thread called > > > > > > "papi on compute node linux" which was last updated 2008-03-10. > > The > > > > > > discussion in this thread sounds to me like it could easily > > explain > > > > what > > > > > > I am seeing. > > > > > > > > > > > > Is there a way I can determine if this discussion (ie: loosing > > > > interrupts) > > > > > > is what I am seeing ? > > > > > > > > > > > > Thanks for any help you can provide. > > > > > > > > > > > > Gary > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > > > > > Don't miss this year's exciting event. There's still time to save > > > > $100. > > > > > > Use priority code J8TL2D2. > > > > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java. > > > > > sun.com/javaone > > > > > > _______________________________________________ > > > > > > perfmon2-devel mailing list > > > > > > [email protected] > > > > > > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > > > Don't miss this year's exciting event. There's still time to save > > $100. > > > > Use priority code J8TL2D2. > > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java. > > > sun.com/javaone > > > > _______________________________________________ > > > > perfmon2-devel mailing list > > > > [email protected] > > > > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > > > > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > > Don't miss this year's exciting event. There's still time to save $100. > > > Use priority code J8TL2D2. > > > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java. > sun.com/javaone > > > > > _______________________________________________ > > > perfmon2-devel mailing list > > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java. > sun.com/javaone > > _______________________________________________ > > perfmon2-devel mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ perfmon2-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
