I am aware of those. I tried the following: 1) Call gsl_set_error_handler_off() to turn off the error handler 2) Compile with -DHAVE_INLINE and -DGSL_RANGE_CHECK_OFF 3) Link to a different cblas (actually I tried openblas, mkl and gslcblas).
However, my most interesting experiment was to modify my minimization algorithms to accept dynamic arrays. Originally, my function was using std::array to store the values of the independent variables when minimizing a multidimensional function. My experiment did mimic my use case: solve the same problem millions of times (to be precise, a million times). When I used dynamic arrays, my algorithm became much slower because of all the calls to malloc/new, and free/delete. So I did modify my algorithm to accept a gsl_vector. Guess what happened? My library became slower than gsl! I am not sure if this is because my Nelder-Mead implementation is different than gsl's (which I did not look at) or if its just the extra overhead of making the wrapper. My point is: The huge differences that I observed were because of gsl_vector which is not efficient for small arrays (stack memory is faster and you avoid the costs to malloc/free). So I think that the gsl minimization algorithms could have a version that uses stack arrays. I should not have to pay for the cost of dynamic allocation if I am minimizing a function of 5 variables (a typical version of the Nelder-Mead algorithm would require 6 points, including the function evaluations that would require an array of 36 elements, which is fine for the stack) . I think that gsl could have alternative implementations that use stack memory (with warnings to not use those if the problem is big). The other issue is: the implementation of gsl_vector just seems inefficient to me. Looking at the code, it seems like a single vector requires 3 calls to malloc and free (one for the data, one for the gsl_block, and one for the gsl_vector itself). The manual states that the block is there for "consistency", and I can see how memory management becomes easier with it. But it seems to be a case of generality at the expense of performance. Also, the stride and the owner flag are part of the gsl_vector object to make it work with gsl_views, but then people who never need views pay the performance price anyway. All of these issues are not likely to matter for big vectors, but they do make a large difference when you are dealing with small objects. Am I the only one who thinks like this? Gilberto PS: As a side note, I prefer Eigen over Armadillo. In my experience, the former is almost always faster. On Sun, Oct 6, 2013 at 5:09 AM, Simon Zehnder <[email protected]>wrote: > Hi Gilbaerto, > > congrats on your performance results! > > A first guess of the worse performance of the gsl library would be > exception throwing. The GSL is made for 'users' and this means, that a user > has to be informed about the kind of exception he encounters. This can be > left out if you have your own code and you know your own code. If something > goes wrong, you know where to look in your code where the error occurred > and what could be the reasons. A 'user' just using a library could be > helpless when he encounters and error and does not know where it occurred. > So checking and error catching could be a reason, that makes the gsl less > performant but more appropriate for 'easy' usage. > > I use a lot the C++ linear algebra library Armadillo. This library has for > example two different element accessors: One, '()' which makes checkings > and the second 'at()' with no checkings. Performance increases sometimes > tremendously when using the 'at()' method. > > Best > > Simon > > > On Oct 6, 2013, at 12:44 AM, onefire <[email protected]> wrote: > > > Hi all, > > > > I am creating a personal library for my C++ projects and to evaluate its > > performance I decided to compare it to gsl and, to my surprise, my > library > > is much faster than gsl in many cases. For example, my Nelder-Mead > > implementation can be 5 or 10 faster for some functions. > > > > This has been bothering me a lot because: > > 1) My tests are correct, and I checked the results against output from > > Scipy and Octave. > > 2) I am certainly not smarter than the many people who worked on gsl over > > the years. > > 3) I think I know how to use gsl "properly". > > > > Sorry I do not have an example, but my library is not polished enough for > > me to publish it yet. > > > > What I am really intrigued about is that I googled around and it seems > > that many people noticed (before me) that many gsl implementations are > > inefficient. I also found some posts on the mailing lists with things > like > > "gsl is much slower than Matlab" and responses like "gsl is not meant to > be > > the fastest". > > > > These things lead me to believe that gsl is "slow" by design, If this is > > the case, can someone explain to me why this is so? > > > > Gilberto > > > > PS: Much of the gain with C++ comes from better inlining, which can be > > specially important with things like function optimization (it is hard, > if > > not impossible, to inline when there is a funcyion pointer passed > around). > > This is why std::sort can be much faster than lqsort. However, I am > > confident that it is possible to write faster implementations (in C) than > > some of the ones in gsl. > >
