Re: [perfmon2] Pfmon and Hpcrun give inconsistent results.

Gary . Mohr Tue, 15 Apr 2008 08:36:35 -0700

Stephane

Well you guessed right on both counts.  The /proc/perfmon shows the version
is 2.0.


The pfmon command looks like this:
pfmon --debug -v -e CPU_CYCLES ./code.exe >pfmon 2>pfmon.debug

The hpcrun command looks like this:
hpcrun -e CPU_CYCLES:32767 -o hpcrun.data -- ./code.exe >hpcrun
2>hpcrun.debug

So in both cases I run it as a tool that monitors another process (does not
use self monitoring).

I fail to see how this can account for the differences but am very
interested in the explanation.

As far as I know, this particular customer application is the only one we
have found that produces
inconsistent results.  All other executables that I have run these tools
against seem to produce
counts for CPU_CYCLES that are very close.

Please tell me more.


[EMAIL PROTECTED] wrote on 04/12/2008 12:05:36
AM:

> Gary,
>
> I suspect you are running the stock perfmon as shipped with 2.6.18,
> i.e., v2.0.
> You can find out in /proc/perfmon.
>
> I would need the cmdline options used for pfmon.
>
> As for HPCRUN, I would need to know how this is run. In particular
> whether this is a self-monitoring run or just like pfmon, a tool
> monitoring another thread.
> I suspect the latter which could explain the differences you are seeing.
>
> On Fri, Apr 11, 2008 at 8:04 PM,  <[EMAIL PROTECTED]> wrote:
> > Stephane
> >
> >  Our system is running:
> >
> >  MODEL ia64   [type=ia64]
> >  CPU   8 x Itanium 2, 64 bits  1600.000442 Mhz
> >  MEM   8219456 kB  real memory
> >  OS    Bull Linux Advanced Server release 4 (V5) - kernel
2.6.18-B64k.1.7
> >
> >  This kernel is based on the 2.6.18 kernel but has Bull specific
patches
> >  included in it.
> >
> >  Since perfmon is included in the kernel I do not know how to find its
> >  version.  I would
> >  expect that we are running the one that comes with the 2.6.18 kernel.
If
> >  you can tell me
> >  how to find a version for perfmon I will get it for you.  In addition
if
> >  you can provide me
> >  with a list of the modules that make up perfmon, I can check to see if
Bull
> >  has made
> >  any patches to those modules.  I know that we have not yet installed
the
> >  perfmon2
> >  kernel patches.  This is on our list to try but has not been done yet.
> >
> >  The value of 154 billion CPU_CYCLES is the approximate value reported
by
> >  PFMON in its stdout.
> >
> >  The value of 2 billion is the approximate result when I multiply the
total
> >  number of samples reported by
> >  HPCPROF (about 68000) times the sampling period used in the HPCRUN
(32767).
> >  As a point of interest
> >  the contents of /proc/interrupts also shows about 68000 perfmon
interrupts
> >  occur during the HPCRUN.
> >
> >  I will send the kernel debug data for both the PFMON and HPCRUN tests
to
> >  your googlemail account
> >  in a separate email.
> >
> >  At this point if you can just point me in the right direction and
suggest
> >  some things to look for I will be
> >  a happy camper.
> >
> >  Thanks
> >
> >
> >  Gary
> >
> >
> >  "stephane eranian" <[EMAIL PROTECTED]> wrote on 04/10/2008
12:23:22
> >  PM:
> >
> >
> >
> > > Gary,
> >  >
> >  > On Wed, Apr 9, 2008 at 1:18 AM,  <[EMAIL PROTECTED]> wrote:
> >  > >
> >  > >  I have a customer who has an application that when run under
pfmon
> >  reports
> >  > >  154 billion CPU_CYCLES used (appears to be a reasonable value).
When
> >  this
> >  > >  same application is run under Hpcrun (from HPCToolkit using PAPI)
it
> >  only
> >  > >  reports about 2 billion CPU_CYCLES used.  These tests are run on
an
> >  Intel
> >  > >  IA64 platform.
> >  > >
> >  > You need to tell me which kernel version, which perfmon version.
> >  >
> >  > Also how did you calculate those 2 numbers? What this simlpe
counting and
> >  > derived from the samples you are getting.
> >  >
> >  > The 'losing interrupts' should not affect you because it is related
> >  > to handling
> >  > of signals in multi-threaded programs.
> >  >
> >  >
> >  > As for the log mail them to me directly.
> >  >
> >  > Thanks.
> >  >
> >  > >  This application runs as a single thread and does not set a
signal
> >  handler
> >  > >  or mask the SIGIO signal. Hpcrun produces 8 data output files
when run
> >  on
> >  > >  this application.  One for the application itself, 4 for bash
scripts
> >  the
> >  > >  application runs, 2 for 'rm' commands the application executes
and 1
> >  for a
> >  > >  gzip command it runs.
> >  > >
> >  > >  The customer wants to know why Hpcrun only reports a little over
1% of
> >  the
> >  > >  cpu
> >  > >  cycles used.  I have been trying to compare what pfmon does to
what
> >  hpcrun
> >  > >  does
> >  > >  and it seems that the only debug data available for both runs is
the
> >  kernel
> >  > >  debug
> >  > >  data written by perfmon.  This data clearly shows that
Hpcrun/Papi is
> >  using
> >  > >  the perfmon services differently than pfmon does.  I tried to
attach
> >  the
> >  > >  debug output for these two runs to this mail but that exceeded
the
> >  allowed
> >  > >  message
> >  > >  size for the list.
> >  > >
> >  > >  I tried adding code (as a test case) to the Papi signal handler
to
> >  count
> >  > >  and print
> >  > >  the number of signals paid during the run.  The values printed
seemed
> >  to
> >  > >  pretty
> >  > >  much match the values reported as number of samples when hpcprof
is
> >  run on
> >  > >  the
> >  > >  hpcrun data files.  This was an attempt to detect if my problem
was
> >  > >  handling signals
> >  > >  or getting them and I think this test showed the problem is in
getting
> >  > >  them.
> >  > >
> >  > >  I have also browsed this mailing list and found a thread called
> >  > >  "papi on compute node linux" which was last updated 2008-03-10.
The
> >  > >  discussion in this thread sounds to me like it could easily
explain
> >  what
> >  > >  I am seeing.
> >  > >
> >  > >  Is there a way I can determine if this discussion (ie: loosing
> >  interrupts)
> >  > >  is what I am seeing ?
> >  > >
> >  > >  Thanks for any help you can provide.
> >  > >
> >  > >  Gary
> >  > >
> >  > >
> >  > >
> >
-------------------------------------------------------------------------
> >  > >  This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> >  > >  Don't miss this year's exciting event. There's still time to save
> >  $100.
> >  > >  Use priority code J8TL2D2.
> >  > >  http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.
> >  > sun.com/javaone
> >  > >  _______________________________________________
> >  > >  perfmon2-devel mailing list
> >  > >  [email protected]
> >  > >  https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
> >  > >
> >
> >
> >
-------------------------------------------------------------------------
> >  This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> >  Don't miss this year's exciting event. There's still time to save
$100.
> >  Use priority code J8TL2D2.
> >  http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.
> sun.com/javaone
> >  _______________________________________________
> >  perfmon2-devel mailing list
> >  [email protected]
> >  https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
> >
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save $100.
> Use priority code J8TL2D2.
>
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

> _______________________________________________
> perfmon2-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Pfmon and Hpcrun give inconsistent results.

Reply via email to