Pranav,

If you were ale to count the number of cycles elapsed while idle then
you could compensate.
However, I am not sure you can get that on Opteron. Maybe by inverting
+ thresholding
some event.

As for your request, it has been on my list of things to do for a
while. It would be a flag
you pass when you create your perfmon session for system-wide. At some
point, I even
had that implemented. But it was very expensive, because at each
interrupt coming out
of idle you had to rewrite all the counters.  There may be ways to
optimize this.

For now, though, I am trying not to add code, so we can increase our
chances of getting
into mainline. I'll keep this option in mind.

Thanks.


On Wed, Mar 26, 2008 at 4:01 PM, Pranav <[EMAIL PROTECTED]> wrote:
> Hello,
>
>  Thanks for the reply. I understand that people might be interested in
>  measuring bus events and cache events when the CPU is in idle mode.
>
>  Unfortunately, I don't have the option of simply measuring user-level
>  events. My workload spends a lot of time in the kernel and kernel level
>  characterization of performance is crucial to my analysis. So I think, I
>  will have to figure it out by way of per-thread monitoring but
>  unfortunately, I wouldn't be able to study interference effects of other
>  programs on my workload. That is ok with me right now. But as a feature
>  request is there a way (right now or in the future) when we can control
>  the behavior of counting events only when the CPU is in non-idle mode
>  from the command-line itself.
>
>  Thanks and Regards,
>  Pranav.
>
>
>
>  On Wed, 2008-03-26 at 19:34 +0100, stephane eranian wrote:
>  > Hello,
>  >
>  >
>  > The reason why one may see higher counts for events different from
>  > CPU_CLK_UNHALTED
>  > while running in system-wide comes from what actually happens when the 
> system is
>  > idle. In per-thread mode, you are only measuring while the thread is
>  > running. In system-wide,
>  > you are measuring even when the CPU goes into idle and eventually into
>  > lower power mode,
>  > i.e., halted state. By definition, CPU_CLK_UNHALTED measures when you
>  > are NOT halted.
>  > However, for other events, it depends on what they measure. It would
>  > not surprise me if
>  > DISPATCH_STALLS keeps on counting while in lower-power state, simply 
> because it
>  > measures something that is still active.
>  >
>  > Perfmon does not explicitly stop monitoring when idle. There is a good
>  > reason for that
>  > and you just witnessed it. Some events keep on counting and some people 
> may want
>  > to see what is going on on the buses or caches when the CPU is in idle.
>  >
>  > I am sure that if you restrict your system-wide measurements to
>  > user-level (-u), then you
>  > won't see the discrepancy.
>  >
>  > Hope this helps.
>  >
>  >
>  >
>  > On Wed, Mar 26, 2008 at 3:19 PM, Pranav <[EMAIL PROTECTED]> wrote:
>  > > Hello All,
>  > >
>  > >  I have been playing a lot with Per-Thread monitoring of perfmon for
>  > >  characterization of database server workloads. The results that I have
>  > >  gotten in per-thread mode are quite accurate.
>  > >
>  > >  However, I did the same analysis in a system-wide measurement mode and I
>  > >  am getting values which seem wrong to me. I would like any of your
>  > >  inputs on this.
>  > >
>  > >  The following is the detailed listing for system-wide mode.
>  > >
>  > >  [EMAIL PROTECTED] pfmon  --aggregate-results
>  > >  --system-wide -uk --verbose -e CPU_CLK_UNHALTED,DISPATCH_STALLS --
>  > >  <command to start the sql client>
>  > >
>  > >  selected CPUs (2 CPU in set, 2 CPUs online): CPU0 CPU1
>  > >  <startup information>
>  > >  using hardware breakpoints
>  > >  unavailable_pmcs=0xfffffffffffffff0
>  > >  [PERFSEL0(pmc0)=0x530076 emask=0x76 umask=0x0 os=1 usr=1 inv=0 en=1
>  > >  int=1 edge=0 cnt_mask=0] CPU_CLK_UNHALTED
>  > >  [PERFCTR0(pmd0)]
>  > >  [PERFSEL1(pmc1)=0x5300d1 emask=0xd1 umask=0x0 os=1 usr=1 inv=0 en=1
>  > >  int=1 edge=0 cnt_mask=0] DISPATCH_STALLS
>  > >  [PERFCTR1(pmd1)]
>  > >  <other unrelated info>
>  > >
>  > >  system wide session on 2 processor(s)
>  > >  vCPU0 -> pCPU0
>  > >  vCPU1 -> pCPU1
>  > >
>  > >  results are on terminal
>  > >
>  > >  starting process [3230]: <command to start the mysql client>
>  > >  waiting for [3230] to exec
>  > >  results are on terminal
>  > >  CPU1 started monitoring
>  > >  CPU0 started monitoring
>  > >
>  > >  <output here>
>  > >
>  > >  CPU0   stopped monitoring
>  > >  set0 runs=1 duration=181498010
>  > >  CPU1   stopped monitoring
>  > >  set0 runs=1 duration=181500829
>  > >  results are on terminal
>  > >  CPU0                     341291164 CPU_CLK_UNHALTED
>  > >  CPU0                     507284317 DISPATCH_STALLS
>  > >
>  > >  As can be seen from the above verbose listing.
>  > >  [PERFSEL0(pmc0)=0x530076 emask=0x76 umask=0x0 os=1 usr=1 inv=0 en=1
>  > >  int=1 edge=0 cnt_mask=0] CPU_CLK_UNHALTED
>  > >  [PERFCTR0(pmd0)]
>  > >  [PERFSEL1(pmc1)=0x5300d1 emask=0xd1 umask=0x0 os=1 usr=1 inv=0 en=1
>  > >
>  > >  I am counting the events for both OS and user level. However, if you see
>  > >  the aggregated output, the total number of cycles is less than the
>  > >  dispatch stalls. How can a processor be stalled more than the time it is
>  > >  executing.
>  > >
>  > >  I checked in detail trying to capture user and kernel level events
>  > >  individually. For user level events DISPATCH_STALLS are always less than
>  > >  the CPU_CLK_UNHALTED. But this is not the case for kernel level events.
>  > >  I am wondering what I am doing wrong or is it a bug in perfmon.
>  > >
>  > >  Also note that DISPATCH_STALLS are very accurate when counting in
>  > >  per-thread mode. In fact when I captured events which can contribute to
>  > >  DISPATCH_STALLS in per thread mode, I got an error percentage of less
>  > >  than 2-3 % (which was quite good). But this is not the case for
>  > >  system-wide mode (where error percentage is as much as 40%).
>  > >
>  > >  Following are my machine specs (pfmon -I)
>  > >
>  > >  detected host CPUs:  2-way 1000MHz/0.5MB -- AMD Athlon(tm) 64 X2 Dual
>  > >  Core Processor 4200+ (stepping 2)
>  > >  detected PMU model: AMD64
>  > >  max counters/set: 4
>  > >  supported PMU models: [AMD64] [Pentium 4] [Intel Core] [Intel
>  > >  architectural PMU]
>  > >  supported sampling modules: [inst-hist] [detailed] [compact] [raw]
>  > >  pfmlib version: 3.2
>  > >  kernel perfmon version: 2.7
>  > >
>  > >  Thanks and Regards,
>  > >  Pranav
>  > >
>  > >
>  > >
>  > >  
> -------------------------------------------------------------------------
>  > >  Check out the new SourceForge.net Marketplace.
>  > >  It's the best place to buy or sell services for
>  > >  just about anything Open Source.
>  > >  
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
>  > >  _______________________________________________
>  > >  perfmon2-devel mailing list
>  > >  [email protected]
>  > >  https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>  > >
>
>

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to