On 23 November 2014 at 19:05, Thierry Dumont <tdum...@math.univ-lyon1.fr> wrote: > > Is gprof enough powerful with modern architectures on such programs? from > my point of view, no. > There are non free, commercial, tools like vtune which can do fantastic > measurement job. Vtune shows, for example, that a call to std::copy is not > as fast as a for loop, which is turned by the compiler in a memcopy > (probably std::copy is not!). I do not think we can see this with gprof. > But ok, you are not supposed to buy vtune... >
I would be surprised if any modern c++ library implementation does not have specialisations of std::copy for POD types that use memcpy() or some other trick. > What about likwid https://code.google.com/p/likwid ? It is free. Did > somebody used it to measure cython code performances? > > Likwid (and Vtune) have in common to use performance counters on Intel > and AMD processors (not sure for AMD with Vtune...). > > What is the size of what you are sorting ? If it is small enough to fit in > the caches, and better in the L1 cache, you can possibly improve something > with your modification, but otherwise it is certainly memory bounded and > you cannot do much... > You have to measure the bandwidth of your program. Vtune does this, > possibly likwid too. > I used callgrind() in the past with some success... I would like to try the google cpu profiler to see how it fares, but I haven't had the chance yet. -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.