Jesper Larsen wrote: > Here is my solution for calculating the correlation coefficients for masked > arrays. Comments are appreciated: > > def macorrcoef(data1, data2): > """ > Calculates correlation coefficients taking masked out values > into account. > > It is assumed (but not checked) that data1.shape == data2.shape. > """ > nv, no = data1.shape > cc = ma.array(0., mask=ones((nv, nv))) > if no > 1: > for i in range(nv): > for j in range(nv): > m = ma.getmaskarray(data1[i,:]) | ma.getmaskarray(data2[j,:]) > d1 = ma.array(data1[i,:], copy=False, mask=m).compressed() > d2 = ma.array(data2[j,:], copy=False, mask=m).compressed() > if ma.count(d1) > 1: > c = corrcoef(d1, d2) > cc[i,j] = c[0,1] > > return cc
I'm afraid this doesn't work, either. Correlation matrices are constrained to be positive semidefinite; that is, all of their eigenvalues must be >= 0. Calculating each of the correlation coefficients in a pairwise fashion doesn't incorporate this constraint. But you're on the right track. My preferred approach to this problem is to find the pairwise correlation matrix as you did and then find the closest positive semidefinite matrix to it using the method of alternating projections. I can't give you the code I wrote for this since it belongs to a customer, but here is the reference I used: http://eprints.ma.man.ac.uk/232/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion