On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas <mjani...@esri.com> wrote: > Hello All, > > > > I am trying to identify columns of a matrix that are perfectly collinear. > It is not that difficult to identify when two columns are identical are have > zero variance, but I do not know how to ID when the culprit is of a higher > order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will > return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide > a very large condition number…. But they do not tell me which columns are > causing the problem. For example: > > > > zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], > > [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], > > [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], > > [ 3. , 8. , 0. , 5. , 0. ]]) > > > > How can I identify that columns 0,1,2 are the issue because: column 1 + > column 2 = column 0? > > > > Any input would be greatly appreciated. Thanks much, >
The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion