On 23 November 2014 at 19:05, Thierry Dumont <tdum...@math.univ-lyon1.fr>
wrote:
>
> Is gprof enough powerful with modern architectures on such programs? from
> my point of view, no.
> There are non free, commercial, tools like vtune which can do fantastic
> measurement job. Vtune shows, for example, that a call to std::copy is not
> as fast as a for loop, which is turned by the compiler in a memcopy
> (probably std::copy is not!). I do not think we can see this with gprof.
> But ok, you are not supposed to buy vtune...
>

I would be surprised if any modern c++ library implementation does not have
specialisations of std::copy for POD types that use memcpy() or some other
trick.


> What about likwid https://code.google.com/p/likwid ? It is free. Did
> somebody used it to measure cython code performances?
>
> Likwid (and Vtune) have in common to use  performance counters on Intel
> and AMD processors (not sure for AMD with Vtune...).
>
> What is the size of what you are sorting ? If it is small enough to fit in
> the caches, and better in the L1 cache, you can possibly improve something
> with your modification, but otherwise it is certainly memory bounded and
> you cannot do much...
> You have to measure the bandwidth of your program. Vtune does this,
> possibly likwid too.
>

I used callgrind() in the past with some success... I would like to try the
google cpu profiler to see how it fares, but I haven't had the chance yet.

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to