To whom it concerns,

I would like if the GSL statistical routines, e.g., rank, sort, mean,
et c., operating on data sets would have configurable way of handling
NaN's.
As you may have seen in the past, when acquiring data, NaN can be used
for missing entries in data.

Proposals:

1. make gsl statistical routines insensitive to presence of NaN's, that
is, skip them
...
  if (isnan(d[i]))
    continue;
  // or do something with d[i] if it is not NaN
...

2. in re sorting of an array adopt three user-selectable strategies:
  I. put them at the beginning of the array
  II. put them at the end of the array
  III. leave them in place (sort data around them)

3. in re ranking of the vector entries, adopt sorting strategy II, then
do ranking as usual on offset array.

4. Ignore NaN's when computing 1-D histograms, but add an entry for
count of NaN's.

5. Ignore NaN's when computing 2-D histrograms, but add row-and-column
in the bin matrix for data points that had one of the coordinates or
both NaNs.

In re 4 and 5, using the global nan strategy put nan's at the beginning,
or at the end of the data matrix.


Regards,
w/boobs


Reply via email to