Bug#322454: python-scipy: several scipy.stats failures

Alexandre Fayolle Thu, 11 Aug 2005 09:49:12 -0700

On Wed, Aug 10, 2005 at 02:39:54PM -0400, Perry, Alexander (GE Infrastructure) 
wrote:
> Package: python-scipy
> Version: 0.3.2-6
> Severity: normal
> Tags: patch
> 
> 
> There are typographic bugs in stats.py that make two functions fail.  In
> addition, the calculation for nanstd() provides obviously wrong answers.
> I have not done any formal checking for the correction attached below.


I'm having a few problems understanding your patch for nanstd, which
does not seem to give the right results either, but I'm not sure of the
expected semantics of the nanstd function. Is it expected that

nanstd([nan, 2., 4.]) == std([2., 4.]) 

or that 

nanstd([nan, 2., 4.]) == std([0., 2., 4.]) 

your code produces neither. I'd go for the first option (given the
definition of nanmean)

I propose the following implementation of nanstd:

def nanstd(x,axis=-1,bias=0):
    """Compute the standard deviation over the given axis ignoring nans
    """
    x, axis = _chk_asarray(x,axis)
    x = x.copy()
    mn = expand_dims(nanmean(x, axis), axis)
    deviation = x - mn
    N.putmask(deviation, isnan(x), 0.)
    y = ss(deviation, axis)
    
    Norig = x.shape[axis]
    n = Norig - sum(isnan(x),axis)*1.0
    if bias:
        y /= n
    else:
        y /= n-1.
    return sqrt(y)

Does it seem correct to you?


 
> In the distributions.py file, the calculation of est_loc_scale() uses
> a sibling method wrongly and references an import by the wrong name.

ok.

> In the same file, nnlf() method passes its own object twice to _nnlf()

I don't see this in my code. The diff you sent me are strange, some of
the chunks seem to be reversed. 

> ***************
> *** 743,749 ****
>               return inf
>           else:
>               N = len(x)
> !             return self._nnlf(self, x, *args) + N*log(scale)
>   
>       def fit(self, data, *args, **kwds):
>           loc0, scale0 = map(kwds.get, ['loc', 'scale'],[0.0, 1.0])
> --- 749,755 ----
>               return inf
>           else:
>               N = len(x)
> !             return self._nnlf(x, *args) + N*log(scale)
>   
>       def fit(self, data, *args, **kwds):
>           loc0, scale0 = map(kwds.get, ['loc', 'scale'],[0.0, 1.0])

Do we agree that the bottom version is correct ? I.e.:
return self._nnlf(x, *args) + N*log(scale) 

This is the code I had in my source tree. 


This is the final patch that I plan to upload. Comments are welcome:

--- stats.py.orig       2005-08-11 17:15:40.000000000 +0200
+++ stats.py    2005-08-11 18:17:38.000000000 +0200
@@ -241,28 +241,27 @@
     x = x.copy()
     Norig = x.shape[axis]
     factor = 1.0-sum(isnan(x),axis)*1.0/Norig
-    n = N-sum(isnan(x),axis)
+    n = N.sum(isnan(x),axis)
     N.putmask(x,isnan(x),0)
-    return stats.mean(x,axis)/factor
+    return mean(x,axis)/factor
     
 def nanstd(x,axis=-1,bias=0):
     """Compute the standard deviation over the given axis ignoring nans
     """
     x, axis = _chk_asarray(x,axis)
     x = x.copy()
+    mn = expand_dims(nanmean(x, axis), axis)
+    deviation = x - mn
+    N.putmask(deviation, isnan(x), 0.)
+    y = ss(deviation, axis)
+    
     Norig = x.shape[axis]
     n = Norig - sum(isnan(x),axis)*1.0
-    factor = n/Norig
-    n = N-sum(isnan(x),axis)
-    N.putmask(x,isnan(x),0)
-    m1 = stats.mean(x,axis)
-    m1c = m1/factor
-    m2 = stats.mean((x-m1c)**2.0,axis)
     if bias:
-        m2c = m2/factor
+        y /= n
     else:
-        m2c = m2*Norig/(n-1.0)
-    return m2c
+        y /= n-1.
+    return sqrt(y)
 
 def _nanmedian(arr1d):  # This only works on 1d arrays
    cond = 1-isnan(arr1d)

--- distributions.py.orig       2005-08-11 17:18:39.000000000 +0200
+++ distributions.py    2005-08-11 18:32:39.000000000 +0200
@@ -730,15 +730,9 @@
         #
         try:
             x = args[-1]
-          args = args[:-1]
-        except IndexError:
-            raise ValueError, "Need shape and data arguments."
-      try:
-            if len(args) == 1:
-              args = args[0]
-          loc = args[-1]
-            scale = args[-2]
-            args = args[:-2]
+            loc = args[-2]
+            scale = args[-3]
+            args = args[:-3]
         except IndexError:
             raise ValueError, "Not enough shape arguments."
         if not self._argcheck(*args) or scale <= 0:
@@ -764,9 +758,9 @@
         return optimize.fmin(self.nnlf,x0,args=(ravel(data),),disp=0)
 
     def est_loc_scale(self, data, *args):
-        mu, mu2 = self.stats(*args,**{'moments':'mv'})
-        muhat = st.nanmean(data)
-        mu2hat = st.nanstd(data)
+        mu, mu2, g1, g2 = self.stats(*args,**{'moments':'mv'})
+        muhat = stats.nanmean(data)
+        mu2hat = stats.nanstd(data)
         Shat = sqrt(mu2hat / mu2)
         Lhat = muhat - Shat*mu
         return Lhat, Shat


-- 
Alexandre Fayolle                              LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org

signature.asc
Description: Digital signature

Bug#322454: python-scipy: several scipy.stats failures

Reply via email to