On Thu, May 2, 2013 at 10:51 AM, Francesc Alted <franc...@continuum.io> wrote: > On 5/2/13 3:58 PM, Nathaniel Smith wrote: >> callgrind has the *fabulous* kcachegrind front-end, but it only >> measures memory access performance on a simulated machine, which is >> very useful sometimes (if you're trying to optimize cache locality), >> but there's no guarantee that the bottlenecks on its simulated machine >> are the same as the bottlenecks on your real machine. > > Agreed, there is no guarantee, but my experience is that kcachegrind > normally gives you a pretty decent view of cache faults and hence it can > do pretty good predictions on how this affects your computations. I > have used this feature extensively for optimizing parts of the Blosc > compressor, and I cannot be more happier (to the point that, if it were > not for Valgrind, I could not figure out many interesting memory access > optimizations).
Right -- if you have code where you know that memory is the bottleneck (so esp. integer-heavy code), then callgrind is perfect. In fact it was originally written to make it easier to optimize the bzip2 compressor :-). My point isn't that it's not useful, just, it's a little more of a specialist tool, so I hesitate to recommend it as the first profiler for people to reach for. An extreme example would be, last time I played with this, I found that for numpy scalar float64 * float64, 50% of the total time was in fiddling with floating point control registers. But that time would be invisible to callgrind's measurements... -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion