Hi Stephane,

Thanks for your prompt answer,
we will allocate some time for these features in the plans for the next 
year.

Looking forward to contributing to perfmon2,

Milena



Stephane Eranian <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
10/18/2007 11:31 AM
Please respond to
[EMAIL PROTECTED]


To
Milena Milenkovic/Austin/[EMAIL PROTECTED]
cc
[EMAIL PROTECTED]
Subject
Re: [perfmon] New features proposal






Hello Milena,

On Wed, Oct 17, 2007 at 09:34:05AM -0500, Milena Milenkovic wrote:
> Is there any interest in having support for more accurate and efficient 
> counter virtualization added to perfmon2?
> 
> By more accurate, we mean providing an option to exclude time spent in 
> interrupts from per-thread time.

I assume you mean turning on/off monitoring around interrupt handlers.

Several months ago, I looked into how to turn monitoring on/off around the
idle loop (i.e., the actual mwait()). It turned out to be quite expensive
especially on x86 where clearing MSRs is a very slow operation (several
hundreds of cycles). Just like for interrupt handlers, the idea was to
exclude useless execution from being monitored, because some counters
actually counts during mwait(). 

I am not against the idea. In fact on Itanium, the hardware can do this
automatically, so there is no penalty. On this architecture, perfmon 
supports this
for system-wide contexts only. You simply pass a flag when you create the 
perfmon
session. I think this can be implemented in the same way on other 
platforms.

There is simply a question of cost compared to the execution time of the
interrupt handler. I think it would be worth investigating. If it turns
out to be both useful and efficient, then I would have no problem adding 
it
although I still think hardware support is much better.


> By more efficient, we mean providing a way for user-space tools to read 
a 
> mapped data area where perfmon would write the values of performance 
monitoring
> counters at the last significant event (interrupt exit/dispatch) for 
each thread.
> 
> This is the approach we use for our Performance Inspector toolset (
> http://sourceforge.net/projects/perfinsp/):
> the Performance Inspector kernel driver virtualizes counters by thread 
by 
> dynamically patching the dispatcher and interrupt entries/exits.

Perfmon does provide per-thread monitoring ("counter virtualization") by
saving/restore counters on context switches and via hooks on fork and 
exit.

> The Java profiler, jprof, gets the virtualized counter values on every 
> method entry and method exit using JVMPI or JVMTI support,
> so it can produce per-method reports. 
> It can also collect these values for C-code that has been recompiled to 
> issue function entry/exit notifications. 
> The current algorithm for per thread metrics keeps the 64-bit values 
> accumulated by the device driver code in a mapped thread area 
> that allows for the reads of the performance counters to be done 
> efficiently in application mode 
> as opposed to requiring a transition to kernel mode using system calls. 
> 

That makes sense. It seems that you may not need to read the data just 
when
you exit the function. You maybe able to read from the buffer at a later 
time
(as long as you can correlate with the function name, using instruction 
pointer).

> Since there is a fairly high probability of perfmon2 being accepted into 

> the mainline kernel,
> we would like to use the interfaces it provides.

> However, we believe a couple of features may be added to perfmon2
> to provide the same functionality of our tools.

> We would like to provide the support for these features if there is 
> interest for them in the community.
> 
Note that perfmon does support an in-kernel sampling buffer. In your case,
I believe what you would need is a way to trigger recording of a sample 
at specific locations as opposed to when a counter (or timeout) overflows.

Currently perfmon records samples in the buffer only when a PMU register
generates an interrupt. This happens when a counter overflows, for
instance. Supposing you had a way to trigger recording of a sample
on function entry and exit, then you would get what you want. I think
the trigger could be implemented as a trap. For instance, on x86 we could
possibly use a software interrupt (int 0x..), then catch this and force
perfmon to think there was a PMU interrupt. I am sure there are equivalent
mechanisms on other architectures.

I think this is an interesting idea worth pursuing.

-- 
-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to