On Thu, May 2, 2013 at 2:58 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Thu, May 2, 2013 at 9:25 AM, David Cournapeau <courn...@gmail.com> wrote: >>> * Re: the profiling, I wrote a full oprofile->callgrind format script >>> years ago: http://vorpus.org/~njs/op2calltree.py >>> Haven't used it in years either but neither oprofile nor kcachegrind >>> are terribly fast-moving projects so it's probably still working, or >>> could be made so without much work. >>> Or easier is to use the gperftools CPU profiler: >>> https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html >> >> I don't have experience with gperftools, but on recent linux kernels, >> you can also use perf, which can't be made easier to use (no runtime >> support needed), but you need a 'recent' kernel >> >> http://indico.cern.ch/getFile.py/access?contribId=20&sessionId=4&resId=0&materialId=slides&confId=141309 >> >> I am hoping to talk a bit about those for our diving into numpy c code >> tutorial in June, what's the + of gperf in your opinion ? > > For what I've used profiling for, THE key feature is proper callgraph > support ("show me the *total* time spent in each function, including > callees"). Otherwise, silly example, let's say you have a bug where > you wrote: > > func1() { > for (i = 0; i < 10000000; i++) > foo = add(foo, bar[0]) > } > > Obviously this is a waste of time, since you're actually performing > the same operation over and over. Many profilers, given this, will > tell you that all the time is spent in 'add', which is useless, > because you don't want to speed up 'add', you want to speed up 'func1' > (probably by not calling 'add' so many times!). If you have relatively > flat code like most kernel code this isn't an issue, but I generally > don't... > > perf is a fabulous framework and doesn't have any way to get full > callgraph information out so IME it's been useless. They have > reporting modes that claim to (like some "fractal" thing?) but AFAI > been able to tell from docs/googling/mailing lists, there is nobody > who understands how to interpret this output except the people who > wrote it. Really a shame that it falls down in the last mile like > that, hopefully they will fix this soon.
Perf doc is written for Vulcan, but it does what I think you want, say: void work(int n) { volatile int i=0; //don't optimize away while(i++ < n); } void easy() { work(1000 * 1000 * 50); } void hard() { work(1000*1000*1000); } int main() { easy(); hard(); } compile with gcc -g -O0, and then: perf record -g -a -- ./a.out perf report -g -a --stdio gives me 95.22% a.out a.out [.] work | --- work | |--89.84%-- hard | main | __libc_start_main | --5.38%-- easy main __libc_start_main or maybe even better with the -G option 95.22% a.out a.out [.] work | --- __libc_start_main main | |--94.35%-- hard | work | --5.65%-- easy work I agree it would be better with kcachgrind-like GUI, but it looks like it is coming. I don't really understand the ncurses frontend to the call graph. (example taken/adapted from the great Yosefk blog: http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html) cheers, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion