Alan G Isaac wrote:
On Fri, 14 Jul 2006, Sven Schreiber apparently wrote:
So maybe that's a feature request, complementing the
nansum function by a nanaverage?
This is not an objection; just an observation.
It has always seemed to me that such descriptive
statistics make more sense as class attributes.
In this case, something like a NanDstat class.
Attached is something like that, in case someone finds it useful. It is
designed to replace something I wrote a long time ago for matlab. It is
only very lightly tested, so use with care.
Eric
'''
Temporary location for a class that provides the functionality
of mstdgap.m, though with a different interface.
'''
import numpy as N
class Stats:
'''
Calculate standard statistics while ignoring data gaps, which
may be indicated by nan values or by masked array input.
Usages:
1) (m, s, n, med) = Stats(y)(median=True)
2) Sy = Stats(y)
m = Sy.mean # mean
s = Sy.std # standard deviation
n = Sy.N # number of valid points
med = Sy.median # median
ydm = Sy.demeaned # de-meaned input array
In the first case we call the instance to get all the statistics at
once; in the second we access the statistics as attributes. In
the latter case, subsequent accesses are cheap because the value
is stored in the instance. Each statistic is calculated
only the first time it is requested.
Caution: the outputs are references to arrays that may be needed
for subsequent calculations, so don't modify them unless you
have already done all calculations.
For usage (1), the argument median=True results in all four
statistics; median=False, the default, calculates and outputs
only the first three.
The constructor has the following keyword arguments and defaults:
axis=0 : specifies the axis along which the stats are calculated
squeeze=True: stats for an N-dimensional input have N-1
dimensions if True, N dimensions if False.
If False, the stats are broadcastable to the dimensions
of the input array.
See also the broadcastable method.
masked='auto' : True|False|'auto' determines the output;
if True, output will be a masked array;
if False, output will be an ndarray with nan used
as a bad flag if necessary;
if 'auto', output will match input
(The N attribute is an ndarray in any case.)
'''
def __init__(self, y, axis=0, squeeze=True, masked='auto'):
'''
See the class docstring.
'''
self._axis = axis
self._squeeze = squeeze
if hasattr(y, 'mask'):
mask = N.ma.getmaskarray(y)
else:
mask = N.isnan(y)
if masked == True or (masked == 'auto' and hasattr(y, 'mask')):
self._nanout = False
else:
self._nanout = True
self._y = N.ma.array(y, mask=mask, fill_value=0)
self._mask = mask
self._mean = None
self._std = None
self._median = None
nbad = self._mask.sum(axis=self._axis)
self._N = self._y.shape[self._axis] - nbad
self.b_shape = list(self._y.shape) # broadcastable shape
self.b_shape[axis] = 1
def __call__(self, median=False):
if median:
return self.mean, self.std, self.N, self.median
else:
return self.mean, self.std, self.N
def broadcastable(self, x):
'''
Change the shape of a summary statistic (mean, N, std,
or median) so that it is broadcastable to the shape of
the input array. This is needed only if the class
constructor was called with squeeze=False.
'''
# as of 2006/07/08 the view method is not implemented
#return x.view().reshape(*self.b_shape)
if hasattr(x, 'mask'):
return N.ma.array(x, copy=False).reshape(*self.b_shape)
else:
return x.view().reshape(*self.b_shape)
def get_N(self):
if self._squeeze:
return self._N
else:
return self.broadcastable(self._N)
N = property(get_N)
def calc_mean(self):
if self._mean is None:
self._mean = self._y.sum(axis=self._axis)/self._N
return self._mean
def get_mean(self):
m = self.calc_mean()
if self._nanout:
m = m.filled(fill_value=N.nan)
if self._squeeze:
return m
else:
return self.broadcastable(m)
mean = property(get_mean)
def calc_std(self):
if self._std is None:
m = self.broadcastable(self.calc_mean())
ss = (self._y - m)**2
n = N.ma.masked_where(self._N = 1, self._N-1)
self._std = N.sqrt(ss.sum(axis=self._axis)/n)
return self._std
def