Hi Brian,
I'm attaching a patch for a modified implementation of
gsl_stats_minmax() that uses fewer comparisons than the current
version. The idea (which appears in the Cormen et al. textbook on
algorithms) is to unroll the main loop, considering two elements in
each iteration. Then only the smaller of the two needs to be compared
against the minimum, and the larger against the maximum. This saves
n/2 comparisons in total, compared with the current version.
As an aside, it would be easy to modify this even further to deal with
NaNs by ignoring them. In the case of order statistics, it might make
sense to change the semantics so that max is an element from the input
array such that the array does not contain any element that is greater
than max. At the moment -- both in the current version and in this
patch -- the behavior is undefined if the array contains NaNs. Also,
it's not clear to me what should happen if n == 0.
Regards,
-- mj
Index: minmax_source.c
===================================================================
RCS file: /cvs/gsl/gsl/statistics/minmax_source.c,v
retrieving revision 1.7
diff -u -r1.7 minmax_source.c
--- minmax_source.c 26 Jun 2005 13:27:11 -0000 1.7
+++ minmax_source.c 23 Dec 2005 08:05:18 -0000
@@ -57,18 +57,68 @@
FUNCTION(gsl_stats,minmax) (BASE * min_out, BASE * max_out, const BASE data[], const size_t stride, const size_t n)
{
/* finds the smallest and largest members of a dataset */
+ /* cf. Cormen et al. 2001, pp. 184--185 */
- BASE min = data[0 * stride];
- BASE max = data[0 * stride];
size_t i;
+ BASE min;
+ BASE max;
- for (i = 0; i < n; i++)
+ if (n == 0)
{
- if (data[i * stride] < min)
- min = data[i * stride];
- if (data[i * stride] > max)
- max = data[i * stride];
+ /* We could raise an error, e.g. */
+ /* GSL_ERROR ("length n of dataset is zero", GSL_EINVAL); */
+ /* but only if the return type of the function is changed. */
+ return;
+ }
+
+ if (n & 1) /* n is odd */
+ {
+ /* assert (n >= 1) */
+ i = 1;
+ min = data[0];
+ max = data[0];
+ }
+ else /* n is even */
+ {
+ BASE x = data[0];
+ BASE y = data[stride];
+
+ /* assert (n >= 2); */
+ i = 2;
+
+ if (x < y)
+ {
+ min = x;
+ max = y;
+ }
+ else
+ {
+ min = y;
+ max = x;
+ }
+ }
+
+ for (/*empty*/; i < n; i+=2)
+ {
+ BASE x = data[i * stride];
+ BASE y = data[(i+1) * stride];
+
+ if (x < y)
+ {
+ if (x < min)
+ min = x;
+ if (y > max)
+ max = y;
+ }
+ else
+ {
+ if (y < min)
+ min = y;
+ if (x > max)
+ max = x;
+ }
}
+ /* assert (i == n); */
*min_out = min ;
*max_out = max ;
_______________________________________________
Bug-gsl mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-gsl