[sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
Did you profile your code on the C-level? e.g. using gprof? As a rule of thumb, guesses about where the bottleneck is are wrong :-) Its entirely conceivable that branch prediction and speculative execution solve this already for you. C++ std::sort will be able to inline the comparator. Link-time optimization (e.g. gcc -flto) can in principle also inline on the level of object code, after the compilation did not inline (because of different compilation units, say).Though the libc sort is in a shared library. On Sunday, November 23, 2014 4:33:46 PM UTC, Nathann Cohen wrote: Hello everybody, I wrote a bruteforce Cython code recently (#17309) which spends most of its time on calls to qsort. This is normal, sorting is sort of the most expensive thing I do, but to call qsort you need to provide a comparison function. Now, as qsort is compiled in a library, the comparison function cannot be inlined inside of qsort and I suspect that it has a nontrivial cost (given how simple the comparison is). Thus, if I copy/paste the original code of qsort into my Cython file the code should be faster, only that is ugly. Sooo... Do you know if there ais way to re-compile my code along with the code of qsort without having to copy/paste it ? Thanks, Nathann -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
Yo ! Did you profile your code on the C-level? e.g. using gprof? As a rule of thumb, guesses about where the bottleneck is are wrong :-) Its entirely conceivable that branch prediction and speculative execution solve this already for you. Yeees daddy :-P I did that like a grown man with perf record :-P C++ std::sort will be able to inline the comparator. Oh cool ! Thanks I will try to call that from Cython. Link-time optimization (e.g. gcc -flto) can in principle also inline on the level of object code, after the compilation did not inline (because of different compilation units, say).Though the libc sort is in a shared library. Oh... So when does that work ? I mean, if you compile everything from source already there is nothing to optimize at link time, is there ? Nathann -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
Le 23/11/2014 18:07, Volker Braun a écrit : Did you profile your code on the C-level? e.g. using gprof? As a rule of thumb, guesses about where the bottleneck is are wrong :-) Its entirely conceivable that branch prediction and speculative execution solve this already for you. Is gprof enough powerful with modern architectures on such programs? from my point of view, no. There are non free, commercial, tools like vtune which can do fantastic measurement job. Vtune shows, for example, that a call to std::copy is not as fast as a for loop, which is turned by the compiler in a memcopy (probably std::copy is not!). I do not think we can see this with gprof. But ok, you are not supposed to buy vtune... What about likwid https://code.google.com/p/likwid ? It is free. Did somebody used it to measure cython code performances? Likwid (and Vtune) have in common to use performance counters on Intel and AMD processors (not sure for AMD with Vtune...). What is the size of what you are sorting ? If it is small enough to fit in the caches, and better in the L1 cache, you can possibly improve something with your modification, but otherwise it is certainly memory bounded and you cannot do much... You have to measure the bandwidth of your program. Vtune does this, possibly likwid too. t.d. C++ std::sort will be able to inline the comparator. Link-time optimization (e.g. gcc -flto) can in principle also inline on the level of object code, after the compilation did not inline (because of different compilation units, say).Though the libc sort is in a shared library. On Sunday, November 23, 2014 4:33:46 PM UTC, Nathann Cohen wrote: Hello everybody, I wrote a bruteforce Cython code recently (#17309) which spends most of its time on calls to qsort. This is normal, sorting is sort of the most expensive thing I do, but to call qsort you need to provide a comparison function. Now, as qsort is compiled in a library, the comparison function cannot be inlined inside of qsort and I suspect that it has a nontrivial cost (given how simple the comparison is). Thus, if I copy/paste the original code of qsort into my Cython file the code should be faster, only that is ugly. Sooo... Do you know if there ais way to re-compile my code along with the code of qsort without having to copy/paste it ? Thanks, Nathann -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com mailto:sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com mailto:sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. attachment: tdumont.vcf
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
On 23 November 2014 at 18:07, Volker Braun vbraun.n...@gmail.com wrote: C++ std::sort will be able to inline the comparator. +1 std::sort() will do exactly what you describe, only in a type-safe and compiler-checked automatic way. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
Hello ! What about likwid https://code.google.com/p/likwid ? It is free. Did somebody used it to measure cython code performances? Never tried vtune, nor likwid. What is the size of what you are sorting ? If it is small enough to fit in the caches, and better in the L1 cache, you can possibly improve something with your modification, but otherwise it is certainly memory bounded and you cannot do much... I sort many small arrays. Several kB, not more. You have to measure the bandwidth of your program. Vtune does this, possibly likwid too. Oh. Do you think that it can be used from outside the program, i.e. on a running Sage session ? Nathann -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
On 23 November 2014 at 19:05, Thierry Dumont tdum...@math.univ-lyon1.fr wrote: Is gprof enough powerful with modern architectures on such programs? from my point of view, no. There are non free, commercial, tools like vtune which can do fantastic measurement job. Vtune shows, for example, that a call to std::copy is not as fast as a for loop, which is turned by the compiler in a memcopy (probably std::copy is not!). I do not think we can see this with gprof. But ok, you are not supposed to buy vtune... I would be surprised if any modern c++ library implementation does not have specialisations of std::copy for POD types that use memcpy() or some other trick. What about likwid https://code.google.com/p/likwid ? It is free. Did somebody used it to measure cython code performances? Likwid (and Vtune) have in common to use performance counters on Intel and AMD processors (not sure for AMD with Vtune...). What is the size of what you are sorting ? If it is small enough to fit in the caches, and better in the L1 cache, you can possibly improve something with your modification, but otherwise it is certainly memory bounded and you cannot do much... You have to measure the bandwidth of your program. Vtune does this, possibly likwid too. I used callgrind() in the past with some success... I would like to try the google cpu profiler to see how it fares, but I haven't had the chance yet. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
There's a sort.h library you should be able to include that will have quick sort + many others, and will allow the compiler to properly inline functions. https://github.com/swenson/sort (disclaimer: I wrote it). I get about a 10x speed improvement over qsort just for the ability to inline functions. I've never used it with Cython, but I don't believe there should be any problems doing so. On Nov 23, 2014 10:11 AM, Francesco Biscani bluesca...@gmail.com wrote: On 23 November 2014 at 19:05, Thierry Dumont tdum...@math.univ-lyon1.fr wrote: Is gprof enough powerful with modern architectures on such programs? from my point of view, no. There are non free, commercial, tools like vtune which can do fantastic measurement job. Vtune shows, for example, that a call to std::copy is not as fast as a for loop, which is turned by the compiler in a memcopy (probably std::copy is not!). I do not think we can see this with gprof. But ok, you are not supposed to buy vtune... I would be surprised if any modern c++ library implementation does not have specialisations of std::copy for POD types that use memcpy() or some other trick. What about likwid https://code.google.com/p/likwid ? It is free. Did somebody used it to measure cython code performances? Likwid (and Vtune) have in common to use performance counters on Intel and AMD processors (not sure for AMD with Vtune...). What is the size of what you are sorting ? If it is small enough to fit in the caches, and better in the L1 cache, you can possibly improve something with your modification, but otherwise it is certainly memory bounded and you cannot do much... You have to measure the bandwidth of your program. Vtune does this, possibly likwid too. I used callgrind() in the past with some success... I would like to try the google cpu profiler to see how it fares, but I haven't had the chance yet. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
Le 23/11/2014 19:07, Francesco Biscani a écrit : On 23 November 2014 at 18:07, Volker Braun vbraun.n...@gmail.com mailto:vbraun.n...@gmail.com wrote: C++ std::sort will be able to inline the comparator. Just look at the assembly code:-) +1 std::sort() will do exactly what you describe, only in a type-safe and compiler-checked automatic way. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com mailto:sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com mailto:sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. attachment: tdumont.vcf
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
Le 23/11/2014 19:09, Nathann Cohen a écrit : Hello ! What about likwid https://code.google.com/p/likwid ? It is free. Did somebody used it to measure cython code performances? Never tried vtune, nor likwid. What is the size of what you are sorting ? If it is small enough to fit in the caches, and better in the L1 cache, you can possibly improve something with your modification, but otherwise it is certainly memory bounded and you cannot do much... I sort many small arrays. Several kB, not more. You have to measure the bandwidth of your program. Vtune does this, possibly likwid too. Oh. Do you think that it can be used from outside the program, i.e. on a running Sage session ? With Vtune it is possible (you start and stop collection with a call to the library (do not imagine I am paid by Intel :-) ,I am not.). With likwid, I don t know. But, ok, it seems your objects can stay in the L1 cache. In that case, there is a possibility to improve performances by doing what you propose to do. I had recently a problem like yours: an ode solver needs to solve linear systems, and I got a slight improvement by replacing the call to lapack by my own routine, probably because the lapack routine I called (which is called at many places in the program) imposes a far branch at each call... and for small systems (size 5), it is penalizing. But what about the quick sort? is it sure that the implementation cannot degenerate? it is well known all the efficiency can be lost if the key used for partition is not chosen as it should be... What about replacing the quick sort by an other method ? (the tree based one?). t.d. Nathann -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. attachment: tdumont.vcf
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
On 2014-11-23 19:05, Thierry Dumont wrote: Vtune shows, for example, that a call to std::copy is not as fast as a for loop, which is turned by the compiler in a memcopy (probably std::copy is not!). If that's the case, get a better C++ compiler. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
On 23 November 2014 at 20:41, Thierry Dumont tdum...@math.univ-lyon1.fr wrote: But what about the quick sort? is it sure that the implementation cannot degenerate? it is well known all the efficiency can be lost if the key used for partition is not chosen as it should be... What about replacing the quick sort by an other method ? (the tree based one?). qsort is not necessarily quick sort. AFAIK, the C/C++ libraries only guarantee that the sorting has logarithmic complexity in the number of comparisons. std::sort is often implemented as a mix of introsort and other sorting algorithms, with special casing for small/big objects, small/large number of items to be sorted, etc. F. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
Le 23/11/2014 20:53, Jeroen Demeyer a écrit : On 2014-11-23 19:05, Thierry Dumont wrote: Vtune shows, for example, that a call to std::copy is not as fast as a for loop, which is turned by the compiler in a memcopy (probably std::copy is not!). If that's the case, get a better C++ compiler. heum... it is icc :-) A c++ compiler as some unpredicable aspects... -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. attachment: tdumont.vcf
Re: [sage-devel] Re: Call qsort from cython code .... with an inlined comparison function ?!
icc is a pretty garbage C++ compiler, unless you are doing exactly the type of linear algebra operations that they optimise to death (on Intel processors at least :) for benchmarketing. On 23 November 2014 at 21:53, Thierry Dumont tdum...@math.univ-lyon1.fr wrote: Le 23/11/2014 20:53, Jeroen Demeyer a écrit : On 2014-11-23 19:05, Thierry Dumont wrote: Vtune shows, for example, that a call to std::copy is not as fast as a for loop, which is turned by the compiler in a memcopy (probably std::copy is not!). If that's the case, get a better C++ compiler. heum... it is icc :-) A c++ compiler as some unpredicable aspects... -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sage-devel group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.