I don't know what the policy is regarding optimizations which
(slightly) decrease code readability, but there's a simple change to
the downward recurrence in gsl_sf_bessel_Jn_e which doubles the speed
of the function (almost all the remaining time is spent in
gsl_sf_bessel_J_CF1) for n of order 10-40.
Thanks for your email. That is interesting. Can you give a few more
details about the compilation options you used, compiler version and
the platform.
My gsl install compiles with:
gcc -DHAVE_CONFIG_H -I. -I.. -I.. -g -O2 -c bessel_Jn.c -o bessel_Jn.o
which are the out-of-the-box options.
Issue observed on PPC G5, OS X, gcc 4.0.1, and also on x86_64,
Ubuntu, gcc 4.1.3.
Did you see how much of the benefit comes from replacing 2/x by a
constant compared with keeping the value of k in a double? The
optimisation of replacing (2/x) by a constant would be something I
would expect GCC to deduce at some level.
Further investigation reveals that gcc CAN optimize 2/x, but only if
strict IEEE compliance is disabled [-ffast-math] which is not the out-
of-the-box option for gsl compilation.
With -O2 and no ffast-math there is roughly 30% improvement for
either of the optimizations on their own, and 40% improvement if they
are both made together (I speculate the lack of further improvement
is because loop overhead is now the bottleneck). OK, not quite
doubling with the parameters I used for this test, but still not to
be sneezed at.
With -O3 -ffast-math there is actually still a slight improvement if
I pull out 2/x (haven't looked at why exactly). Roughly 30%
improvement obtained from the shadow variable optimization.
Hope this helps
Jonny
_______________________________________________
Bug-gsl mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-gsl