Re: [R] Generate multivariate normal data with a random correlation matrix
Thanks for the response, Rex. This is an interesting approach. The Choleski decomposition approach that John suggested seems to be an obvious and direct approach to this problem. Your approach is less obvious to me but may be equal or superior to the Choleski decomposition. Are all possible correlation matrices of size k equally likely using your approach? It would seem so based on your description. If so, it is a way cool solution. Rick On Thu, Feb 10, 2011 at 12:18 PM, rex.dw...@syngenta.com wrote: If you want a random correlation matrix, why not just generate random data and accept the correlation matrix that you get? The standard normal distribution in k dimensions is (hyper)spherically symmetric. If you generate k standard normal N(0,1) variates, you have a point in k-space with direction uniformly distributed on the (k-1)sphere and Gaussian magnitude. If you generate k such, you have a random linear transformation with all sorts of desirable symmetries. So, if you generate a kxk matrix of standard normal variates, and another nxk standard normal variates, and multiply the two matrices to get n points in k space, that seems to be a pretty good definition of random correlation to me. I'm sure you can decompose the kxk matrix to get the theoretical distribution, maybe by multiplying it by its transpose and doing an SVD; I'd have to think about that part. ... unless you have a particular distribution of correlation matrices in mind to begin with, which doesn't seem to be the case. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Szumiloski, John Sent: Wednesday, February 09, 2011 11:30 AM To: r-help@r-project.org Cc: Rick DeShon Subject: Re: [R] Generate multivariate normal data with a random correlation matrix The knee jerk thought I had was to express the correlation matrix as a generic Choleski decomposition, then randomly populate the triangular decomposed matrix. When you remultiply, you can simply rescale to 1s on the diagonals. Then rmnorm as usual. In R, see ?chol If you want to get fancy, you could look at the random distribution you would use for the triangular matrix and play with that, including different distributions for different elements, elements' distributions being conditional on values of previously randomized elements, etc. John -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rick DeShon Sent: Wednesday, 09 February, 2011 11:06 AM To: r-h...@stat.math.ethz.ch Subject: [R] Generate multivariate normal data with a random correlation matrix Hi All. I'd like to generate a sample of n observations from a k dimensional multivariate normal distribution with a random correlation matrix. My solution: The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d entries. Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) Populate a triangle of the matrix with the sampled correlations Mirror the triangle to populate the other triangle forming a symmetric matrix, cormat Sample n observations from a multivariate normal distribution with mean vector=0 and varcov=cormat Problem: This approach violates the triangle inequality property of correlation matrices. So, the matrix I've constructed is certainly a valid matrix but it is not a valid correlation matrix and it blows up when you submit it to a random number generator such as rmnorm. With a small matrix you sometimes get lucky and generate a valid correlation matrix but as you increase d the probability of obtaining a valid correlation matrix drops off quickly. So, any ideas on how to construct a correlation matrix with random entries that cover the range (or most of the range) or the correlation [-1,1]? Here's the code I've used that won't work. library(mnormt) n - 1000 d - 50 n.tri - ((d*(d+1))/2)-d r - runif(n.tri, min=-.5, max=.5) cormat - diag(c) count1=1 for (i in 1:c){ for (j in 1:c){ if (ij) { cormat[i,j]=r[count1] cormat[j,i]=cormat[i,j] count1=count1+1 } } } eigen(cormat) # if negative eigenvalue, then the matrix violates the triangle inequality x - rmnorm(n, rep(0, c), cormat) # Sample the data Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11
[R] Generate multivariate normal data with a random correlation matrix
Hi All. I'd like to generate a sample of n observations from a k dimensional multivariate normal distribution with a random correlation matrix. My solution: The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d entries. Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) Populate a triangle of the matrix with the sampled correlations Mirror the triangle to populate the other triangle forming a symmetric matrix, cormat Sample n observations from a multivariate normal distribution with mean vector=0 and varcov=cormat Problem: This approach violates the triangle inequality property of correlation matrices. So, the matrix I've constructed is certainly a valid matrix but it is not a valid correlation matrix and it blows up when you submit it to a random number generator such as rmnorm. With a small matrix you sometimes get lucky and generate a valid correlation matrix but as you increase d the probability of obtaining a valid correlation matrix drops off quickly. So, any ideas on how to construct a correlation matrix with random entries that cover the range (or most of the range) or the correlation [-1,1]? Here's the code I've used that won't work. library(mnormt) n - 1000 d - 50 n.tri - ((d*(d+1))/2)-d r - runif(n.tri, min=-.5, max=.5) cormat - diag(c) count1=1 for (i in 1:c){ for (j in 1:c){ if (ij) { cormat[i,j]=r[count1] cormat[j,i]=cormat[i,j] count1=count1+1 } } } eigen(cormat) # if negative eigenvalue, then the matrix violates the triangle inequality x - rmnorm(n, rep(0, c), cormat) # Sample the data Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generate multivariate normal data with a random correlation matrix
Hi All. I'd like to generate a sample of n observations from a k dimensional multivariate normal distribution with a random correlation matrix. My solution: 1) The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d entries. 2) Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) 3) Populate a triangle of the matrix with the sampled correlations 4) Mirror the triangle to populate the other triangle forming a symmetric matrix, cormat 5) Sample n observations from a multivariate normal distribution with mean vector=0 and varcov=cormat Problem: This approach violates the triangle inequality property of correlation matrices. So, the matrix I've constructed is certainly a valid matrix but it is not a valid correlation matrix and it blows up when you submit it to a random number generator such as rmnorm. With a small matrix you sometimes get lucky and generate a valid correlation matrix but as you increase d the probability of obtaining a valid correlation matrix drops off quickly. So, any ideas on how to construct a correlation matrix with random entries that cover the range (or most of the range) or the correlation [-1,1]? Here's the code I've used that won't work. library(mnormt) n - 1000 d - 50 n.tri - ((d*(d+1))/2)-d r - runif(n.tri, min=-.5, max=.5) cormat - diag(c) count1=1 for (i in 1:c){ for (j in 1:c){ if (ij) { cormat[i,j]=r[count1] cormat[j,i]=cormat[i,j] count1=count1+1 } } } eigen(cormat) # if negative eigenvalue, then the matrix violates the triangle inequality x - rmnorm(n, rep(0, c), cormat) # Sample the data Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nlsList {nlme} - control arguments problem
Hi All. I'd like to send some control arguments to the nls function when performing a nlsList analysis. I'm fitting a power model to some grouped data and would like to impose lower bounds on the estimates using the port algorithm. Obtaining the lower bound constraint works fine with a direct call to nls for a single level of the grouping variable. However, the bounds aren't imposed when performing the nlsList analysis across the levels of the grouping variable. Any idea why this isn't working? # Generate example data ## trial - 1:100 result - list() for (i in 1:3) { min - rnorm(max(trial),250,5) dif - rnorm(max(trial),500,5) p - rnorm(max(trial),-.5,.1) e - rnorm(max(trial),mean=0,sd=5) y - min + dif*(trial)^p + e result[[i]] - data.frame(y,trial,id=i) } newdf -do.call('rbind',result) df.gr - groupedData( y ~ trial | id, data=newdf) ### Single unit analysis ### The boundary condition on the dplt parameter is enforced! .. df.one - subset(df.gr,id==1) nls(y~SSpowrDplt(trial,min,dplt,dif,p),data=df.one,algorithm=port,lower=c(0.0,0.0,0.0,-10)) .. example output... Nonlinear regression model model: y ~ SSpowrDplt(trial, min, dplt, dif, p) data: df.one min dplt dif p 247.052 0.000 491.965 -0.462 residual sum-of-squares: 234322 Algorithm port, convergence message: relative convergence (4) ## ### nlsList analysis ### Boundary condition on dplt is not enforced . Lfit.nls - nlsList(y~SSpowrDplt(trial,min,dplt,dif,p),data=df.gr,control=list(algorithm='port',lower=c(0.0,0.0,0.0,-10),maxiter=100)) .. example output... Call: Model: y ~ SSpowrDplt(trial, min, dplt, dif, p) | id Data: df.gr Coefficients: min Estimate Std. Error t value Pr(|t|) 1 276.2354 16.16609 17.087337 1.086442e-44 2 257.0127 20.43564 12.576694 3.390686e-30 3 206.4017 29.01315 7.114075 7.354863e-12 dplt Estimate Std. Error t value Pr(|t|) 1 -0.06579982 0.03848086 -1.7099365 0.0951222 2 -0.01694362 0.04161933 -0.4071093 0.6786473 3 0.08981518 0.04636532 1.9371199 0.0528957 dif Estimate Std. Error t value Pr(|t|) 1 477.5049 21.89002 21.81382 6.679439e-62 2 488.7171 22.11908 22.09482 4.466288e-66 3 552.7105 25.04206 22.07129 9.215344e-65 p Estimate Std. Error t value Pr(|t|) 1 -0.5455936 0.06262040 -8.712713 7.615265e-16 2 -0.4839114 0.06074282 -7.966560 1.307734e-14 3 -0.4059903 0.05455864 -7.441355 9.297527e-13 Residual standard error: 27.43384 on 888 degrees of freedom # If you look at the structure of Lfit.nls, it looks like the control arguments are passed correctly. str(Lfit.nls) List of 3 $ 1:List of 6 ..$ control :List of 7 .. ..$ maxiter : num 100 .. ..$ tol : num 1e-05 .. ..$ minFactor: num 0.000977 .. ..$ printEval: logi FALSE .. ..$ warnOnly : logi FALSE .. ..$ algorithm: chr port .. ..$ lower : num [1:4] 0 0 0 -10 If it helps, here's the selfStart function that I'm using powrDpltInit - function(mCall, LHS, data) { xy - sortedXyData(mCall[[x]],LHS,data) min.s - min(y) dif.s - max(y)-min(y) dplt.s - 0.5 p.s - -.20 value - c(min.s, dplt.s, dif.s, p.s) names(value) - mCall[c(min,dplt,dif,p)] value } SSpowrDplt-selfStart(~min + dplt*x + dif*x^p,initial=powrDpltInit, parameters=c(min,dplt,dif,p)) Thanks for your help! Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replace zeros in a block diagonal matrix with small random values?
Hi All. Imagine you have a large block diagonal matrix. I'd like to replace the zeros in this matrix with small random (runif) numbers. Any ideas for a simple and efficient way to do this? Best regards, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NLS plinear question
Hi All. I've run into a problem with the plinear algorithm in nls that is confusing me. Assume the following reaction time data over 15 trials for a single unit. Trials are coded from 0-14 so that the intercept represents reaction time in the first trial. trl RT 01132.0 1 630.5 21371.5 3 704.0 4 488.5 5 575.5 6 613.0 7 824.5 8 509.0 9 791.0 10 492.5 11 515.5 12 467.0 13 556.5 14 456.0 Now fit a power function to this data using nls with the plinear algorithm fit.pw -nls(RT ~ cbind(1,trl, trl^p), start = c(p = -.2), algorithm = plinear, data=df.one) Yields the following error message Error in numericDeriv(form[[3]], names(ind), env) : Missing value or an infinity produced when evaluating the model Now, recode trial from 1-15 and run the same model. fit.pw -nls(RT ~ cbind(1,trl, trl^p), start = c(p = -.2), algorithm = plinear, data=df.one) Seems to work fine now... Nonlinear regression model model: RT ~ cbind(1, trl, trl^p) data: df.one p .lin1.lin.trl .lin3 -0.2845 200.3230-8.9467 904.7582 residual sum-of-squares: 555915 Number of iterations to convergence: 11 Any idea why having a zero for the first value of X causes this problem? Thanks in advance, Rick DeShon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nlsList (nlme) error
Hi All. I'm trying to run nlsList an getting an error that makes no sense to me. I have accuracy and reaction time data over many trials for each person (id) When I use nlsList code that is virtually identical to the example in the doc file I get the following error. I've tried everything I could think of and can't get around it. Any ideas what I'm doing wrong? ** fm.nls-nlsList(RT ~ SSasymp(trial, Asym, R0, lrc) , data = df.group, na.action=na.exclude, start = c(Asym = 550, R0 = 770, rc =-2.5)) Error in nlsList.formula(RT ~ SSasymp(trial, Asym, R0, lrc), data = df.group, : Data must be a groupedData object if formula does not include groups *** Here's the first 10 records of my data: df.group[1:10,] Grouped Data: RT ~ trial | id id trial ACC RT block 1.1 1 1 1 1 2.1 2 1 0 NA 1 3.1 3 1 1 1309 1 4.1 4 1 1 544 1 5.1 5 1 1 654 1 6.1 6 1 0 NA 1 7.1 7 1 1 441 1 8.1 8 1 1 882 1 9.1 9 1 1 1097 1 10.1 10 1 1 898 1 The data are clearly grouped. Just to be sure str(df.group) Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 15480 obs. of 5 variables: $ id : Ord.factor w/ 43 levels 31830..: 41 11 33 22 4 27 5 2 37 19 ... $ trial: int 1 1 1 1 1 1 1 1 1 1 ... $ ACC : int 1 0 1 1 1 0 1 1 1 1 ... $ RT : int NA 1309 544 654 NA 441 882 1097 898 ... $ block: int 1 1 1 1 1 1 1 1 1 1 ... - attr(*, formula)=Class 'formula' length 3 RT ~ trial | id .. ..- attr(*, .Environment)=R_GlobalEnv - attr(*, FUN)=function (x) - attr(*, order.groups)= logi TRUE Thanks! Rick DeShon version _ platform i386-apple-darwin8.10.1 arch i386 os darwin8.10.1 system i386, darwin8.10.1 status major 2 minor 6.1 year 2007 month 11 day26 svn rev43537 language R version.string R version 2.6.1 (2007-11-26) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Efficient computation of average covariance matrix over a list
Hi All. I would like to compute a separate covariance matrix for a set of variables for each of the levels of a factor and then compute the average covariance matrix over the factor levels. I can loop through this computation but I need to perform the calculation for a large number of levels and am looking for something more elegant. To be concrete u- 3 n- 10 x- rnorm((id*u)) y- rnorm((id*u)) z- rnorm((id*u)) id - gl(u,n) df - data.frame(id,x,y,z) df.s - split(xxx,id) lcov - lapply(df.s,cov) lcov What's an efficient way to compute the average covariance matrix over the list members in lcov? Thanks in advance, Rick [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Efficient computation of average covariance matrix over a list
Hi All. I would like to compute a separate covariance matrix for a set of variables for each of the levels of a factor and then compute the average covariance matrix over the factor levels. I can loop through this computation but I need to perform the calculation for a large number of levels and am looking for something more elegant. To be concrete u- 3 n- 10 x- rnorm((id*u)) y- rnorm((id*u)) z- rnorm((id*u)) id - gl(u,n) df - data.frame(id,x,y,z) df.s - split(xxx,id) lcov - lapply(df.s,cov) lcov What's an efficient way to compute the average covariance matrix over the list members in lcov? Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.