Re: Perf event for Wall-time based sampling?

Milian Wolff Fri, 19 Sep 2014 08:05:38 -0700

On Friday 19 September 2014 11:47:47 Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 19, 2014 at 10:11:21AM +0200, Milian Wolff escreveu:
> > On Thursday 18 September 2014 17:36:25 Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Sep 18, 2014 at 02:17:49PM -0600, David Ahern escreveu:
> > > > On 9/18/14, 1:17 PM, Arnaldo Carvalho de Melo wrote:
> > > > >>This was also why I asked my initial question, which I want to
> > > > >>repeat
> > > > >>once
> > > > >>
> > > > >>>more: Is there a technical reason to not offer a "timer" software
> > > > >>>event
> > > > >>>to
> > > > >>>perf? I'm a complete layman when it comes to Kernel internals, but
> > > > >>>from
> > > > >>>a user point of view this would be awesome:
> > > > >>>
> > > > >>>perf record --call-graph dwarf -e sw-timer -F 100 someapplication
> > > > >>>
> > > > >>>This command would then create a timer in the kernel with a 100Hz
> > > > >>>frequency. Whenever it fires, the callgraphs of all threads in
> > > > >>>$someapplication are sampled and written to perf.data. Is this
> > > > >>>technically not feasible? Or is it simply not implemented?
> > > > >>>I'm experimenting with a libunwind based profiler, and with some
> > > > >>>ugly
> > > > >>>signal hackery I can now grab backtraces by sending my application
> > > > >>>SIGUSR1. Based on> >
> > > > >
> > > > >Humm, can't you do the same thing with perf? I.e. you send SIGUSR1 to
> > > > >your app with the frequency you want, and then hook a 'perf probe'
> > > > >into
> > > > >your signal... /me tries some stuff, will get back with results...
> > 
> > That is actually a very good idea. With the more powerful scripting
> > abilities in perf now, that could/should do the job indeed. I'll also try
> > this out.
>
> Now that the need for getting a backtrace from existing threads, a-la
> using ptrace via gdb to attach to it and then traverse its stack to
> provide that backtrace, I think we need to do something on the perf
> infrastructure in the kernel to do that, i.e. somehow signal the perf
> kernel part that we want a backtrace for some specific thread.
> 
> Not at event time, but at some arbitrary time, be it creating an event
> that, as you suggested, will create a timer and then when that timer
> fires will use (parts of the) mechanism used by ptrace.
> 
> But in the end we need a mechanism to ask for backtraces for existing,
> sleeping, threads.


Yes, such a capability would be tremendously helpful for writing profiling 
tools for userspace applications. It should also work for threads that are not 
sleeping though, or are all threads of a process frozen when a perf event 
fires anyways?

> For waits in the future, we just need to ask for the tracepoints where
> waits take place and with the current infrastructure we can get most/all
> of what we need, no?

See David's email, this is possible even now with the sched:* events.
 
> > > > Current profiling options with perf require the process to be running.
> > > > What
> > > 
> > > Ok, so you want to see what is the wait channel and unwind the stack
> > > from there? Is that the case? I.e. again, a sysrq-t equivalent?
> > 
> > Hey again :)
> > 
> > If I'm not mistaken, I could not yet bring my point across. The final goal
> > is
> Right, but the discussion so far was to see if the existing kernel
> infrastructure would allow us to write such a tool.
> 
> > to profile both, wait time _and_ CPU time, combined! By sampling a
> > userspace program with a constant frequency and some statistics one can
> > get extremely useful information out of the data. I want to use perf to
> > automate the process that Mike Dunlavey explains here:
> > http://stackoverflow.com/a/378024/35250
> Will read.
> 
> > So yes, I want to see the wait channel and unwind from there, if the
> > process is waiting. Otherwise just do the normal unwinding you'd do when
> > any other perf event occurs.
> 
> Ok, we need two mechanisms to get that, one for existing, sleeping
> threads at the time we start monitoring (we don't have that now other
> than freezing the process, looking for threads that were waiting, then
> using ptrace and asking for that backtrace), and another for threads
> that will sleep/wait _after_ we start monitoring.
> 
> > > > Milian want is to grab samples every timer expiration even if process
> > > > is
> > > > not running.
> > > 
> > > What for? And by "grab samples" you want to know where it is waiting for
> > > something, together with its callchain?
> > 
> > See above. If I can sample the callchain every N ms, preferrably in a per-
> 
> Do you need to take periodic samples of the callchain? Or only when you
> wait for something?
> 
> For things that happen at such a high freq, yeah, just sampling would be
> better, but you're thinking about slow things, so multiple samples for
> something waiting and waiting is not needed, just when it starts
> waiting, right?

I need periodic samples. I'm not looking exclusively for wait time (that can 
be done already, see above). I'm looking for a generic overview of the 
userspace program. Where does it spent time? Without periodic samples, I 
cannot do any statistics. I think this should become clear when you read Mike 
Dunlavey's text that explains the GDB-based poor-mans-profiler (which is 
actually pretty helpful, despite the name).

> > thread manner, I can create tools which find the slowest userspace
> > functions. This is based on "inclusive" cost, which is made up out of CPU
> > time _and_ wait> 
> > time combined. With it one will find all of the following:
> >   - CPU hotspots
> >   - IO wait time
> >   - lock contention in a multi-threaded application
> >   
> > > > Any limitations that would prevent doing this with a sw event? e.g,
> > > > mimic
> > > > task-clock just don't disable the timer when the task is scheduled
> > > > out.
> > > 
> > > I'm just trying to figure out if what people want is a complete
> > > backtrace of all threads in a process, no matter what they are doing, at
> > > some given time, i.e. at "sysrq-t" time, is that the case?
> > 
> > Yes, that sounds correct to me. I tried it out on my system, but the
> > output in dmesg is far too large and I could not find the call stack
> > information of my test application while it slept. Looking at the other
> > backtraces I see there, I'm not sure whether sysrq-t only outputs a
> > kernel-backtrace? Or maybe it's
> sysrq-t is just for the kernel, that is why I said that you seemed to
> want it to cross into userspace.

Ah, ok. I misunderstood you.

<snip>

> Right, adding callchains to 'perf trace' and using it with --duration
> may provide a first approximation for threads that will start waiting
> after 'perf trace' starts, I guess.

Yes, see also the other mail on that. This would be very helpful, but only 
partially related to the goal I have in mind here :)

Bye
-- 
Milian Wolff
[email protected]
http://milianw.de
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Perf event for Wall-time based sampling?

Reply via email to