Re: [Numpy-discussion] numpy.average(fooArray, axis=0): dropping nans from calculation

Eric Firing Fri, 14 Jul 2006 09:29:13 -0700

Alan G Isaac wrote:

On Fri, 14 Jul 2006, Sven Schreiber apparently wrote:
So maybe that's a feature request, complementing thenansum function by a nanaverage?
This is not an objection; just an observation.
It has always seemed to me that such descriptive
statistics make more sense as class attributes.
In this case, something like a NanDstat class.

Attached is something like that, in case someone finds it useful. It isdesigned to replace something I wrote a long time ago for matlab. It isonly very lightly tested, so use with care.


Eric

'''
Temporary location for a class that provides the functionality
of mstdgap.m, though with a different interface.
'''

import numpy as N

class Stats:
    '''
    Calculate standard statistics while ignoring data gaps, which
    may be indicated by nan values or by masked array input.

    Usages:
        1) (m, s, n, med) = Stats(y)(median=True)
        2) Sy = Stats(y)
           m = Sy.mean      # mean
           s = Sy.std       # standard deviation
           n = Sy.N         # number of valid points
           med = Sy.median  # median
           ydm = Sy.demeaned  # de-meaned input array

    In the first case we call the instance to get all the statistics at
    once; in the second we access the statistics as attributes.  In
    the latter case, subsequent accesses are cheap because the value
    is stored in the instance.  Each statistic is calculated
    only the first time it is requested.

    Caution: the outputs are references to arrays that may be needed
    for subsequent calculations, so don't modify them unless you
    have already done all calculations.

    For usage (1), the argument median=True results in all four
    statistics; median=False, the default, calculates and outputs
    only the first three.

    The constructor has the following keyword arguments and defaults:
        axis=0 : specifies the axis along which the stats are calculated
        squeeze=True: stats for an N-dimensional input have N-1
                dimensions if True, N dimensions if False.
                If False, the stats are broadcastable to the dimensions
                of the input array.
                See also the broadcastable method.
        masked='auto' : True|False|'auto' determines the output;
                if True, output will be a masked array;
                if False, output will be an ndarray with nan used
                            as a bad flag if necessary;
                if 'auto', output will match input
                (The N attribute is an ndarray in any case.)
    '''
    def __init__(self, y, axis=0, squeeze=True, masked='auto'):
        '''
        See the class docstring.
        '''
        self._axis = axis
        self._squeeze = squeeze
        if hasattr(y, 'mask'):
            mask = N.ma.getmaskarray(y)
        else:
            mask = N.isnan(y)
        if masked == True or (masked == 'auto' and hasattr(y, 'mask')):
            self._nanout = False
        else:
            self._nanout = True
        self._y = N.ma.array(y, mask=mask, fill_value=0)
        self._mask = mask
        self._mean = None
        self._std = None
        self._median = None
        nbad = self._mask.sum(axis=self._axis)
        self._N = self._y.shape[self._axis] - nbad
        self.b_shape = list(self._y.shape)   # broadcastable shape
        self.b_shape[axis] = 1


    def __call__(self, median=False):
        if median:
            return self.mean, self.std, self.N, self.median
        else:
            return self.mean, self.std, self.N

    def broadcastable(self, x):
        '''
        Change the shape of a summary statistic (mean, N, std,
        or median) so that it is broadcastable to the shape of
        the input array.  This is needed only if the class
        constructor was called with squeeze=False.
        '''
        # as of 2006/07/08 the view method is not implemented
        #return x.view().reshape(*self.b_shape)
        if hasattr(x, 'mask'):
            return N.ma.array(x, copy=False).reshape(*self.b_shape)
        else:
            return x.view().reshape(*self.b_shape)

    def get_N(self):
        if self._squeeze:
            return self._N
        else:
            return self.broadcastable(self._N)
    N = property(get_N)

    def calc_mean(self):
        if self._mean is None:
            self._mean = self._y.sum(axis=self._axis)/self._N
        return self._mean

    def get_mean(self):
        m = self.calc_mean()
        if self._nanout:
            m = m.filled(fill_value=N.nan)
        if self._squeeze:
            return m
        else:
            return self.broadcastable(m)
    mean = property(get_mean)

    def calc_std(self):
        if self._std is None:
            m = self.broadcastable(self.calc_mean())
            ss = (self._y - m)**2
            n = N.ma.masked_where(self._N <= 1, self._N-1)
            self._std = N.sqrt(ss.sum(axis=self._axis)/n)
        return self._std

    def get_std(self):
        s = self.calc_std()
        if self._nanout:
            s = s.filled(fill_value=N.nan)
        if self._squeeze:
            return s
        else:
            return self.broadcastable(s)
    std = property(get_std)

    def calc_median(self):
        if self._median is None:
            ysort = N.ma.sort(self._y, axis=self._axis)
            ii = N.indices(ysort.shape)[self._axis]
            ngood = self.broadcastable(self._N)
            i0 = (ngood-1)//2
            i1 = ngood//2
            cond = N.logical_or(i0==ii, i1==ii)
            m0 = N.ma.where(cond, ysort, 0)
            m = m0.sum(axis=self._axis)/cond.sum(axis=self._axis)
            self._median = m
        return self._median

    def get_median(self):
        m = self.calc_median()
        if self._nanout:
            m = m.filled(fill_value=N.nan)
        if self._squeeze:
            return m
        else:
            return self.broadcastable(m)
    median = property(get_median)

    def get_demeaned(self):
        m = self.broadcastable(self.calc_mean())
        y = self._y - m
        if self._nanout:
            y = y.filled(fill_value=N.nan)
        return y
    demeaned = property(get_demeaned)

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.average(fooArray, axis=0): dropping nans from calculation

Reply via email to