Re: [Numpy-discussion] numpy.average(fooArray, axis=0): dropping nans from calculation

2006-07-14 Thread Sven Schreiber
Curzio Basso schrieb:
 Well try it out and see for yourself ;-)
 
 good point :-)
 
 But for sums it doesn't make a difference, right... Note that it's
 called nan*sum* and not nanwhatever.
 
 sure, I was still thinking about the first post which was referring to
 the average...
 
 qrz
 

Right, having to count the Nans then makes the masked-array solution
more attractive, that's true. So maybe that's a feature request,
complementing the nansum function by a nanaverage?




-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.average(fooArray, axis=0): dropping nans from calculation

2006-07-14 Thread Eric Firing

Alan G Isaac wrote:
On Fri, 14 Jul 2006, Sven Schreiber apparently wrote: 

So maybe that's a feature request, complementing the 
nansum function by a nanaverage? 



This is not an objection; just an observation.
It has always seemed to me that such descriptive
statistics make more sense as class attributes.
In this case, something like a NanDstat class.


Attached is something like that, in case someone finds it useful.  It is 
designed to replace something I wrote a long time ago for matlab.  It is 
only very lightly tested, so use with care.


Eric

'''
Temporary location for a class that provides the functionality
of mstdgap.m, though with a different interface.
'''

import numpy as N

class Stats:
'''
Calculate standard statistics while ignoring data gaps, which
may be indicated by nan values or by masked array input.

Usages:
1) (m, s, n, med) = Stats(y)(median=True)
2) Sy = Stats(y)
   m = Sy.mean  # mean
   s = Sy.std   # standard deviation
   n = Sy.N # number of valid points
   med = Sy.median  # median
   ydm = Sy.demeaned  # de-meaned input array

In the first case we call the instance to get all the statistics at
once; in the second we access the statistics as attributes.  In
the latter case, subsequent accesses are cheap because the value
is stored in the instance.  Each statistic is calculated
only the first time it is requested.

Caution: the outputs are references to arrays that may be needed
for subsequent calculations, so don't modify them unless you
have already done all calculations.

For usage (1), the argument median=True results in all four
statistics; median=False, the default, calculates and outputs
only the first three.

The constructor has the following keyword arguments and defaults:
axis=0 : specifies the axis along which the stats are calculated
squeeze=True: stats for an N-dimensional input have N-1
dimensions if True, N dimensions if False.
If False, the stats are broadcastable to the dimensions
of the input array.
See also the broadcastable method.
masked='auto' : True|False|'auto' determines the output;
if True, output will be a masked array;
if False, output will be an ndarray with nan used
as a bad flag if necessary;
if 'auto', output will match input
(The N attribute is an ndarray in any case.)
'''
def __init__(self, y, axis=0, squeeze=True, masked='auto'):
'''
See the class docstring.
'''
self._axis = axis
self._squeeze = squeeze
if hasattr(y, 'mask'):
mask = N.ma.getmaskarray(y)
else:
mask = N.isnan(y)
if masked == True or (masked == 'auto' and hasattr(y, 'mask')):
self._nanout = False
else:
self._nanout = True
self._y = N.ma.array(y, mask=mask, fill_value=0)
self._mask = mask
self._mean = None
self._std = None
self._median = None
nbad = self._mask.sum(axis=self._axis)
self._N = self._y.shape[self._axis] - nbad
self.b_shape = list(self._y.shape)   # broadcastable shape
self.b_shape[axis] = 1


def __call__(self, median=False):
if median:
return self.mean, self.std, self.N, self.median
else:
return self.mean, self.std, self.N

def broadcastable(self, x):
'''
Change the shape of a summary statistic (mean, N, std,
or median) so that it is broadcastable to the shape of
the input array.  This is needed only if the class
constructor was called with squeeze=False.
'''
# as of 2006/07/08 the view method is not implemented
#return x.view().reshape(*self.b_shape)
if hasattr(x, 'mask'):
return N.ma.array(x, copy=False).reshape(*self.b_shape)
else:
return x.view().reshape(*self.b_shape)

def get_N(self):
if self._squeeze:
return self._N
else:
return self.broadcastable(self._N)
N = property(get_N)

def calc_mean(self):
if self._mean is None:
self._mean = self._y.sum(axis=self._axis)/self._N
return self._mean

def get_mean(self):
m = self.calc_mean()
if self._nanout:
m = m.filled(fill_value=N.nan)
if self._squeeze:
return m
else:
return self.broadcastable(m)
mean = property(get_mean)

def calc_std(self):
if self._std is None:
m = self.broadcastable(self.calc_mean())
ss = (self._y - m)**2
n = N.ma.masked_where(self._N = 1, self._N-1)
self._std = N.sqrt(ss.sum(axis=self._axis)/n)
return self._std

def 

[Numpy-discussion] numpy.average(fooArray, axis=0): dropping nans from calculation

2006-07-13 Thread Webb Sprague
Could someone recommend a way to average an array along the columns
without propagating the nans and without turning them into some weird
number which bias the result?  I guess I can just keep using an
indexing array for fooArray, but if there is somehting more graceful,
I would love to know.

Boy missing data is a pain in the neck...

Thanks again!


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.average(fooArray, axis=0): dropping nans from calculation

2006-07-13 Thread Eric Firing
Webb Sprague wrote:
 Could someone recommend a way to average an array along the columns
 without propagating the nans and without turning them into some weird
 number which bias the result?  I guess I can just keep using an
 indexing array for fooArray, but if there is somehting more graceful,
 I would love to know.

Something like this:

import numpy as N
ym = N.ma.masked_where(N.isnan(y), y)
yaverage = N.ma.average(ym)

 
 Boy missing data is a pain in the neck...

It certainly is, but Masked Arrays ease the pain.

Eric


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion