On Tue, Oct 08, 2013 at 11:03:16AM +0900, Namhyung Kim wrote: > On Wed, 2 Oct 2013 12:18:28 +0200, Frederic Weisbecker wrote: > > On Thu, Sep 26, 2013 at 05:58:03PM +0900, Namhyung Kim wrote: > >> From: Namhyung Kim <namhyung....@lge.com> > >> > >> Current collapse stage has a scalability problem which can be > >> reproduced easily with parallel kernel build. This is because it > >> needs to traverse every children of callchain linearly during the > >> collapse/merge stage. Convert it to rbtree reduced the overhead > >> significantly. > >> > >> On my 400MB perf.data file which recorded with make -j32 kernel build: > >> > >> $ time perf --no-pager report --stdio > /dev/null > >> > >> before: > >> real 6m22.073s > >> user 6m18.683s > >> sys 0m0.706s > >> > >> after: > >> real 0m20.780s > >> user 0m19.962s > >> sys 0m0.689s > >> > >> During the perf report the overhead on append_chain_children went down > >> from 96.69% to 18.16%: > >> > >> - 18.16% perf perf [.] append_chain_children > >> - append_chain_children > >> - 77.48% append_chain_children > >> + 69.79% merge_chain_branch > >> - 22.96% append_chain_children > >> + 67.44% merge_chain_branch > >> + 30.15% append_chain_children > >> + 2.41% callchain_append > >> + 7.25% callchain_append > >> + 12.26% callchain_append > >> + 10.22% merge_chain_branch > >> + 11.58% perf perf [.] dso__find_symbol > >> + 8.02% perf perf [.] sort__comm_cmp > >> + 5.48% perf libc-2.17.so [.] malloc_consolidate > >> > >> Reported-by: Linus Torvalds <torva...@linux-foundation.org> > >> Cc: Jiri Olsa <jo...@redhat.com> > >> Cc: Frederic Weisbecker <fweis...@gmail.com> > >> Link: http://lkml.kernel.org/n/tip-d9tcfow6stbrp4btvgs51...@git.kernel.org > >> Signed-off-by: Namhyung Kim <namhy...@kernel.org> > > > > Have you tested this patchset when collapsing is not used? > > There are fair chances that this patchset does not only improve collapsing > > but also callchain insertion in general. So it's probably a win in any > > case. But > > still it would be nice to make sure that it's the case because we are > > getting > > rid of collapsing anyway. > > > > The test that could tell us about that is to run "perf report -s sym" and > > compare the > > time it takes to complete before and after this patch, because "-s sym" > > shouldn't > > involve collapses. > > > > Sorting by anything that is not comm should do the trick in fact. > > Yes, I have similar result when collapsing is not used. Actually when I > ran "perf report -s sym", the performance improves higher since it'd > insert more callchains in a hist entry.
Great! I'll have a closer look and review on the callchain patches then. Please resend these along the comm batch. Thanks again! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/