Stephane

"stephane eranian" <[EMAIL PROTECTED]> wrote on 04/18/2008 02:22:34
PM:

> Gary,
>
> On Fri, Apr 18, 2008 at 7:03 PM,  <[EMAIL PROTECTED]> wrote:
> >
> >  I had downloaded the perfmon2 package but have not installed them yet.
My
> >  understanding
>
> Which packages are you talking about here?

The actual file I downloaded was named "perfmon-new-base-060926.tar.tar".
I
do not remember right now where I found it on the web but my understanding
is
that it is a bunch of patch files that will add perfmon2 to a 2.6.18
kernel.

Its README file contains this:

License fluff  ...

This is a perfmon2 patch for the kernel version: linux-2.6.18

The patch is broken down in several pieces. There are
two major parts:
        - arch-specific: divided into modified and new files
        - common code: divided by functionality

Quick install:
        - cd to your kernel source tree
        - cat ../perfmon-new-base-060926/*.diff | patch -p1

The MIPS support is provided by Philip Mucci ([EMAIL PROTECTED]). Please
send any questions to him as well as Stephane Eranian ([EMAIL PROTECTED]).

>
> >
> >  Unfortunately our customer is in a secure environment and therefore
all I
> >  have is a copy of the
> >  binary executable for the application.  It took us 3 months and a non
> >  disclosure agreement with them
> >  to even get that much.  Since I do not have source I can not modify
the
> >  application to skip doing the
> >  fork/exec's.
>
> That's all right. We should not need to know about what the
> application is doing.
> We can figure out whether it is forking or not and how many children.
> That should
> be all we need to know.

I agree and I think the hpcrun data files already have shown us all this
information.

>
> >
> >  As Stephane pointed out in another email on this chain, pfmon (by
default)
> >  does not count
> >  events for the forked processes.   The values reported by pfmon when
used
> >  in sampling
> >  mode seem to account for the CPU time used by the applicaiton,   When
pfmon
> >  is run in sampling
> >  mode on this application it reports 4788780 samples.  When hpcrun is
used
> >  it only reports 41602
> >  samples (in the code.exe data file).  Both of these runs were done
with a
> >  sample period of 32767.
> >
> For the pfmon samples, you mean with --follow-all and --overflow-block?
> What you should do is correlate the number of samples with the duration
of
> each process. If I recall pfmon can give the real time of a process
> if you pass
> the --show-time option. Don't remember if this works with --follow-
> all, though.
> Keep in mind that there will be blind spots with non-self sampling.
>

No, the pfmon samples were run with --long-smpl-periods and --smpl-outfile
but
did not use either of the options you mentioned.  This means that it
sampled
only the execution of code.exe and not any of the forked shell scripts.  I
ran
the whole test under time so I had its output also.  I understand that the
blind spots occur in this environment and I think that I mentioned when
reporting
the results of that test that I saw about 3 seconds (out of about 100
seconds of
CPU time used) which were not accounted for (ie: blind spots).


> Knowing that PAPI is all self-monitoring (self-sampling) I am
> surprised to see such
> a low count with hpcrun. As Phil mentioned, it is likely a problem
> with PAPI during
> fork when sampling.
>

Me too.  I now think that Mark's description that (and I paraphrase):

when a child process goes away, its papi_stop call closes the shared
file descriptor used by it and the parent process to access the counters

(Hope I did not misrepresent what Mark said)

explains the problem that we are seeing.  In fact I have asked our customer
to try an additional test to try and confirm this condition.  The test I
asked
them to run is to either eliminate the fork of the bash scripts from their
application or to modify the bash scripts to stay in around (add a sleep)
until after the parent is done running.  I do not know their application so
I do not know if this is even possible but if it is it may prove we have
the
correct explanation and it may even give them an avoidance until we can get
PAPI and hpcrun to handle this condition correctly.

>
> >  Excerpts from this grep output follows:
> >
> >  Apr  8 15:33:22 molson kernel: pfm_mask_monitoring.928: CPU0 [18168]
> >  pmc[4]=0x801220
> >  Apr  8 15:33:22 molson kernel: pfm_restore_monitoring.1013: CPU0
[18168]
> >  [18168] pmc[4]=0x801228
> >  Apr  8 15:33:22 molson kernel: pfm_mask_monitoring.928: CPU0 [18168]
> >  pmc[4]=0x801220
> >  Apr  8 15:33:22 molson kernel: pfm_mask_monitoring.928: CPU0 [18168]
> >  pmc[4]=0x801220
> >
> Don't recall if you should be alarmed by the double mask-monitoring.I
have not
> looked at the v2.0 codebase in a very long time. I'll check quickly.

If double masks are not a good thing then we may also have something else
going on
(in perfmon ??) that should be looked into.

Actually the double restart looked more interesting to me.  It seems to me
that I read
somewhere that when the restart is done that perfmon reloads the pmd
register with the
negative value to set up when the overflow will occur.  If this is done and
we unmask
monitoring, then before an overflow occurs we do it again, we have just
thrown away all
the events that were counted between the two restarts.  When I look in the
kernel debug
output, the double restart probably happens 50 to 100 times as often as the
double mask.

While I am at it, I spent a day or so recently reading the perfmon overview
and browsing
the Perfmon2 Interface Specification.  I saw your name on both of these
documents and I
just wanted you to know that I found both of them easy reading and very
informative.  The
Interface Specification is very well written and probably one of the best
documents I have
tried to read in quite some time.  It probably even saved me from asking a
few stupid
questions on the mailing list.

Thanks
Gary


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to