Re: [Numpy-discussion] corrcoef of masked array

Robert Kern Wed, 30 May 2007 10:48:37 -0700

Jesper Larsen wrote:

> Here is my solution for calculating the correlation coefficients for masked 
> arrays. Comments are appreciated:
> 
> def macorrcoef(data1, data2):
>   """
>   Calculates correlation coefficients taking masked out values
>   into account.
> 
>   It is assumed (but not checked) that data1.shape == data2.shape.
>   """
>   nv, no = data1.shape
>   cc = ma.array(0., mask=ones((nv, nv)))
>   if no > 1:
>     for i in range(nv):
>       for j in range(nv):
>         m = ma.getmaskarray(data1[i,:]) | ma.getmaskarray(data2[j,:])
>         d1 = ma.array(data1[i,:], copy=False, mask=m).compressed()
>         d2 = ma.array(data2[j,:], copy=False, mask=m).compressed()
>         if ma.count(d1) > 1:
>           c = corrcoef(d1, d2)
>           cc[i,j] = c[0,1]
> 
>   return cc


I'm afraid this doesn't work, either. Correlation matrices are constrained to be
positive semidefinite; that is, all of their eigenvalues must be >= 0.
Calculating each of the correlation coefficients in a pairwise fashion doesn't
incorporate this constraint.

But you're on the right track. My preferred approach to this problem is to find
the pairwise correlation matrix as you did and then find the closest positive
semidefinite matrix to it using the method of alternating projections. I can't
give you the code I wrote for this since it belongs to a customer, but here is
the reference I used:

  http://eprints.ma.man.ac.uk/232/

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] corrcoef of masked array

Reply via email to