On Thu, Sep 12, 2013 at 10:36:58PM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker <fweis...@gmail.com> wrote: > > > The way we handle hists sorted by comm is to first gather them by tid > > then in the end merge/collapse hists that end up with the same comm. > > > > But merging hists has shown some performances issues, especially with > > callchain where the operation can be very heavy. > > > > So this new comm infrastructure aims at removing comm collapses. It > > brings two features: > > > > 1) Keep track of comms lifecycle by storing timestamps when the comms > > are set. This way we can map the precise comm to any thread:time couple. > > This only works if the PERF_SAMPLE_ID comes along comm and fork events, > > otherwise we only track the latest comm set for a thread. > > > > This can provide us more precise comm sorted hists by distinguishing pre > > and post exec timeframes into seperate hists for a single thread. > > > > Note that although the comm infrastructure is ready to do this, I > > haven't yet made the perf tools support that. It's a TODO entry. > > > > 2) Allocate comms only once instead of duplicating them for all threads > > sharing a same one. Two threads having the same comm should now point to > > the same string. As a result we can compare hists thread comm by > > address. > > > > The big upside is that we can now live sort comm hists instead of > > collapsing them in the end of the processing. > > > > I've seen very nice performance results on perf report. Roughly a 1.5x > > to 2x on perf report default stdio output with callchains. > > > > You can try this branch: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git > > perf/comm > > > > May be merging that with Namhyung callchains patches could provide some > > cumulative nice results. > > It would be nice to try Linus's testcase, which is, in essence a kernel > build profile: > > make defconfig > perf record -g make -j64 bzImage > > and to make sure that it can analyze the data in same, non-annoying > runtimes. What I saw was 30 minutes of runtime - a 2x improvement is not > nearly enough, 15 minutes is still an eternity.
I doubt we can reach anything near non-annonying runtimes after recording all the callchains of a whole kernel build perf record. My patches and Namhyung's should improve the comm situation a lot but we can't do much miracle. The only way would be perhaps to be able to limit the deepness of the callchain branches. Now may be we can find other big contention point in perf. It's possible we also have some endless loop somewhere. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/