On Friday 19 September 2014 11:33:40 Arnaldo Carvalho de Melo wrote: > Em Fri, Sep 19, 2014 at 02:59:55PM +0900, Namhyung Kim escreveu: > > Hi Arnaldo and Millan, > > > > On Thu, 18 Sep 2014 16:17:13 -0300, Arnaldo Carvalho de Melo wrote: > > > Em Thu, Sep 18, 2014 at 06:37:47PM +0200, Milian Wolff escreveu: > > >> That would indeed be very welcome. There are multiple "defaults" in > > >> perf which I find highly confusing. The --no-children above e.g. > > >> could/should probably be the default, no? Similar, I find it extremely > > >> irritating that `perf report -g`> > > > > It was, this is something we've actually been discussing recently: the > > > change that made --children be the default mode. That is why I added > > > Namhyung and Ingo to the CC list, so that they become aware of more > > > reaction to this change. > > > > Yeah, we should rethink about changing the default now. Actually I'm > > okay with the change, Ingo what do you think? > > > > >> defaults to `-g fractal` and not `-g graph`. > > >> > > >> 100% foo > > >> > > >> 70% bar > > >> > > >> 70% asdf > > >> 30% lalala > > >> > > >> 30% baz > > >> > > >> is much harder to interpret than > > >> > > >> 100% foo > > >> > > >> 70% bar > > >> > > >> 49% asdf > > >> 21% lalala > > >> > > >> 30% baz > > > > I also agree with you. :) > > > > > But the question then is if this is configurable, if not that would be a > > > first step, i.e. making this possible via some ~/.perfconfig change. > > > > Yes, we have record.call-graph and top.call-graph config options now so > > adding a new report.call-graph option should not be difficult. However > > I think it'd be better being call-graph.XXX as it can be applied to all > > No problem, with sourcing being supported in ~/.perfconfig, we can have > as many #include call-graph.XXX as needed, multiple levels of includes > and all. > > > other subcommands transparently. > > > > What about like below? > > > > [call-graph] > > > > mode = dwarf > > dump-size = 8192 > > print-type = fractal > > order = callee > > threshold = 0.5 > > print-limit = 128 > > sort-key = function > > Milian, does this provide what you expect? How would we call this > specific call-graph profile?
print-type should be graph, not fractal. Otherwise it sounds good to me. But how would one use it? I tried putting it into ~/.perfconfig, but apparently my 3.16.2-1-ARCH Perf does not support this feature yet? How/when would that config be used? As soon as one does "perf record -g" (for mode and dump-size) or "perf report" (for a perf.data with call graphs)? That would be very useful! > Ideas on where to put this below tools/perf/? Nope. > > > Later we could advocate changing the default. Or perhaps provide some > > > "skins", i.e. config files that could be sourced into ~/.perfconfig so > > > that perf mimics the decisions of other profilers, with which people are > > > used to. > > > > > > Kinda like making mutt behave like pine (as I did a long time ago), even > > > if just for a while, till one gets used to the "superior" default way of > > > doing things of the new tool :-) > > > > > >> > > I did that already, but Brendan and the other available Perf > > >> > > documentation > > >> > > mostly concentrates on performance issues in the Kernel. I'm > > >> > > interested > > >> > > purely in the user space. Perf record with one of the hardware PMU > > >> > > events > > >> > > works nicely in that case, but one cannot use it to find > > >> > > locks&waits > > >> > > similar to what VTune offers. > > >> > > > >> > Humm, yeah, you need to figure out how to solve your issue, what I > > >> > tried > > >> > was to show what kinds of building blocks you could use to build what > > >> > you need, but no, there is no ready to use tool for this, that I am > > >> > aware of. > > > > I'm also *very* interest in collecting idle/wait info using perf. Looks > > like we can somehow use sched:* tracepoints but it requires root > > privilege though (unless /proc/sys/kernel/perf_event_paranoid being -1). > > > > With that restriction however, we might improve perf sched (or even > > plain perf record/report) to provide such info.. David may have an > > idea. :) > > So this is like, thinking just on userspace here for a change, using > syscall entry as the wait time entry, then later, when we are at syscall > exit time (another tracepoint) we subtract the time from syscall entry, > and that is our waiting-for-the-kernel time, put a call-chain there and > we can sort by that syscall time, i.e. a bit like 'perf trace' + > callchains, no? > > I.e. if I want to see what syscalls are taking more than 50ms on my > system: > > [root@zoo ProgramasRFB]# trace --duration 50 | head -10 <snip> > > But then want to filter this a bit more and exclude poll calls: > > [root@zoo ProgramasRFB]# trace --duration 50 -e \!poll | head -10 <snip> This sounds useful. If it now would also print backtraces... :) But note how this would not find a userspace function that does 5 syscalls, each taking 10ms. And again, just sampling syscalls is not what I'm looking for. It is definitely a good tool to have at hand, but very specific already. If you want to get an overview on where your app is spending time, having a Wall-time overview is much more helpful as it will also show CPU hotspots or stuff unrelated to syscalls. Bye -- Milian Wolff [email protected] http://milianw.de -- To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
