On Thu, Mar 15, 2012 at 11:23:28PM -0000, Ted Harding wrote: > On 15-Mar-2012 Filoche wrote: > > Hi everyone. > > > > Based on a dependent variable (y), I'm trying to generate some > > independent variables with a specified correlation. For this > > there's no problems. > > However, I would like that have all my "regressors" to be > > orthogonal (i.e. no correlation among them). > > > > For example, > > > > y = x1 + x2 + x3 where the correlation between y x1 = 0.7, > > x2 = 0.4 and x3 = 0.8. However, x1, x2 and x3 should not be > > correlated to each other. > > > > Anyone can help me? > > > > Regards, > > Phil > > Your fundamental problem here (with the correlations you specify) > is the following. > > Your desired correlation matrix can be constructed by > > C <- cbind( c(1.0,0.7,0.4,0.8),c(0.7,1.0,0.0,0.0), > c(0.4,0.0,1.0,0.0),c(0.8,0.0,0.0,1.0) ) > rownames(C) <- c("y","x1","x2","x3") > colnames(C) <- c("y","x1","x2","x3") > > C > # y x1 x2 x3 > # y 1.0 0.7 0.4 0.8 > # x1 0.7 1.0 0.0 0.0 > # x2 0.4 0.0 1.0 0.0 > # x3 0.8 0.0 0.0 1.0 > > And now: > > det(C) > # [1] -0.29 > > and it is impossible for the determinant of a correlation > matrix to have a negative determinant: a correlation matyrix > must be positive-semidefinite, and therefore have a non-negative > determinant. > > An alternative check is to look at the eigen-structure of C: > > eigen(C) > # $values > # [1] 2.1357817 1.0000000 1.0000000 -0.1357817 > # > # $vectors > # [,1] [,2] [,3] [,4] > # [1,] 0.7071068 0.000000e+00 0.0000000 0.7071068 > # [2,] 0.4358010 -1.172802e-16 0.7874992 -0.4358010 > # [3,] 0.2490291 -8.944272e-01 -0.2756247 -0.2490291 > # [4,] 0.4980582 4.472136e-01 -0.5512495 -0.4980582 > > so one of the eigenvalues (-0.1357817) is negative, again > impossible for a correlation matrix.
Thank you for this analysis. For general correlations, say, s1, s2, s3, the matrix is y x1 x2 x3 y 1 s1 s2 s3 x1 s1 1 0 0 x2 s2 0 1 0 x3 s3 0 0 1 and its determinant is 1 - s1^2 - s2^2 - s3^2. Since there was also a requirement that y = x1 + x2 + x3, the correlation matrix should be singular. Hence, the required correlation structure implies s1^2 + s2^2 + s3^2 = 1. If this condition is satisfied, then a multivariate distribution obtained by multiplying a vector from three-dimensional N(0, I) by the matrix (s1 s2 s3) (s1 0 0) ( 0 s2 0) ( 0 0 s3) has the required correlation structure. However, this is still not a solution of the original question, since the original requirement was to find x1, x2, x3, when y is given. I do not know, whether a solution for an arbitrary y exists, even if the above condition on the correlations is satisfied. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.