Re: [R] Generate multivariate normal data with a random correlation matrix

2011-02-10 Thread Rick DeShon
Thanks for the response, Rex.  This is an interesting approach.  The
Choleski decomposition approach that John suggested seems to be an
obvious and direct approach to this problem.  Your approach is less
obvious to me but may be equal or superior to the Choleski
decomposition.

Are all possible correlation matrices of size k equally likely using
your approach?  It would seem so based on your description.  If so, it
is a way cool solution.

Rick




On Thu, Feb 10, 2011 at 12:18 PM,  rex.dw...@syngenta.com wrote:
 If you want a random correlation matrix, why not just generate random data 
 and accept the correlation matrix that you get?  The standard normal 
 distribution in k dimensions is (hyper)spherically symmetric.  If you 
 generate k standard normal N(0,1) variates, you have a point in k-space with 
 direction uniformly distributed on the (k-1)sphere and Gaussian magnitude.  
 If you generate k such, you have a random linear transformation with all 
 sorts of desirable symmetries.  So, if you generate a kxk matrix of standard 
 normal variates, and another nxk standard normal variates, and multiply the 
 two matrices to get n points in k space, that seems to be a pretty good 
 definition of random correlation to me.  I'm sure you can decompose the kxk 
 matrix to get the theoretical distribution, maybe by multiplying it by its 
 transpose and doing an SVD; I'd have to think about that part.
 ... unless you have a particular distribution of correlation matrices in mind 
 to begin with, which doesn't seem to be the case.


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Szumiloski, John
 Sent: Wednesday, February 09, 2011 11:30 AM
 To: r-help@r-project.org
 Cc: Rick DeShon
 Subject: Re: [R] Generate multivariate normal data with a random correlation 
 matrix

 The knee jerk thought I had was to express the correlation matrix as a 
 generic Choleski decomposition, then randomly populate the triangular 
 decomposed matrix.  When you remultiply, you can simply rescale to 1s on the 
 diagonals.  Then rmnorm as usual.

 In R, see ?chol

 If you want to get fancy, you could look at the random distribution you would 
 use for the triangular matrix and play with that, including different 
 distributions for different elements, elements' distributions being 
 conditional on values of previously randomized elements, etc.

 John

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Rick DeShon
 Sent: Wednesday, 09 February, 2011 11:06 AM
 To: r-h...@stat.math.ethz.ch
 Subject: [R] Generate multivariate normal data with a random correlation 
 matrix

 Hi All.

 I'd like to generate a sample of n observations from a k dimensional 
 multivariate normal distribution with a random correlation matrix.

 My solution:
 The lower (or upper) triangle of the correlation matrix has 
 n.tri=(d/2)(d+1)-d entries.
 Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) 
 Populate a triangle of the matrix with the sampled correlations Mirror the 
 triangle to populate the other triangle forming a symmetric matrix, cormat 
 Sample n observations from a multivariate normal distribution with mean 
 vector=0 and varcov=cormat


 Problem:
 This approach violates the triangle inequality property of correlation 
 matrices.  So, the matrix I've constructed is certainly a valid matrix but it 
 is not a valid correlation matrix and it blows up when you submit it to a 
 random number generator such as rmnorm.  With a small matrix you sometimes 
 get lucky and generate a valid correlation matrix but as you increase d the 
 probability of obtaining a valid correlation matrix drops off quickly.

 So, any ideas on how to construct a correlation matrix with random entries 
 that cover the range (or most of the range) or the correlation [-1,1]?

 Here's the code I've used that won't work.
 
 library(mnormt)
 n - 1000
 d - 50

 n.tri - ((d*(d+1))/2)-d
 r       - runif(n.tri, min=-.5, max=.5)

 cormat - diag(c)
 count1=1
 for (i in 1:c){
       for (j in 1:c){
               if (ij) {
                               cormat[i,j]=r[count1]
                               cormat[j,i]=cormat[i,j]
                               count1=count1+1
                            }
       }
 }
 eigen(cormat)     # if negative eigenvalue, then the matrix violates the 
 triangle inequality

 x -  rmnorm(n, rep(0, c), cormat)  # Sample the data



 Thanks in advance,

 Rick DeShon

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 Notice:  This e-mail message, together with any attachme...{{dropped:11

[R] Generate multivariate normal data with a random correlation matrix

2011-02-09 Thread Rick DeShon
Hi All.

I'd like to generate a sample of n observations from a k dimensional
multivariate normal distribution with a random correlation matrix.

My solution:
The lower (or upper) triangle of the correlation matrix has
n.tri=(d/2)(d+1)-d entries.
Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99)
Populate a triangle of the matrix with the sampled correlations
Mirror the triangle to populate the other triangle forming a symmetric
matrix, cormat
Sample n observations from a multivariate normal distribution with
mean vector=0 and varcov=cormat


Problem:
This approach violates the triangle inequality property of correlation
matrices.  So, the matrix I've constructed is certainly a valid matrix
but it is not a valid correlation matrix and it blows up when you
submit it to a random number generator such as rmnorm.  With a small
matrix you sometimes get lucky and generate a valid correlation matrix
but as you increase d the probability of obtaining a valid correlation
matrix drops off quickly.

So, any ideas on how to construct a correlation matrix with random
entries that cover the range (or most of the range) or the correlation
[-1,1]?

Here's the code I've used that won't work.

library(mnormt)
n - 1000
d - 50

n.tri - ((d*(d+1))/2)-d
r       - runif(n.tri, min=-.5, max=.5)

cormat - diag(c)
count1=1
for (i in 1:c){
       for (j in 1:c){
               if (ij) {
                               cormat[i,j]=r[count1]
                               cormat[j,i]=cormat[i,j]
                               count1=count1+1
                            }
       }
}
eigen(cormat)     # if negative eigenvalue, then the matrix violates
the triangle inequality

x -  rmnorm(n, rep(0, c), cormat)  # Sample the data



Thanks in advance,

Rick DeShon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generate multivariate normal data with a random correlation matrix

2011-02-09 Thread Rick DeShon
Hi All.

I'd like to generate a sample of n observations from a k dimensional
multivariate normal distribution with a random correlation matrix.

My solution:
1) The lower (or upper) triangle of the correlation matrix has
n.tri=(d/2)(d+1)-d entries.
2) Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99)
3) Populate a triangle of the matrix with the sampled correlations
4) Mirror the triangle to populate the other triangle forming a
symmetric matrix, cormat
5) Sample n observations from a multivariate normal distribution with
mean vector=0 and varcov=cormat


Problem:
This approach violates the triangle inequality property of correlation
matrices.  So, the matrix I've constructed is certainly a valid matrix
but it is not a valid correlation matrix and it blows up when you
submit it to a random number generator such as rmnorm.  With a small
matrix you sometimes get lucky and generate a valid correlation matrix
but as you increase d the probability of obtaining a valid correlation
matrix drops off quickly.

So, any ideas on how to construct a correlation matrix with random
entries that cover the range (or most of the range) or the correlation
[-1,1]?

Here's the code I've used that won't work.

library(mnormt)
n - 1000
d - 50

n.tri - ((d*(d+1))/2)-d
r       - runif(n.tri, min=-.5, max=.5)

cormat - diag(c)
count1=1
for (i in 1:c){
       for (j in 1:c){
               if (ij) {
                               cormat[i,j]=r[count1]
                               cormat[j,i]=cormat[i,j]
                               count1=count1+1
                            }
       }
}
eigen(cormat)     # if negative eigenvalue, then the matrix violates
the triangle inequality

x -  rmnorm(n, rep(0, c), cormat)  # Sample the data



Thanks in advance,

Rick DeShon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nlsList {nlme} - control arguments problem

2009-06-29 Thread Rick DeShon
Hi All.

I'd like to send some control arguments to the nls function when
performing a nlsList analysis.

I'm fitting a power model to some grouped data and would like to
impose lower bounds on the estimates using the port algorithm.
Obtaining the lower bound constraint works fine with a direct call to
nls for a single level of the grouping variable.  However, the bounds
aren't imposed when performing the nlsList analysis across the levels
of the grouping variable.  Any idea why this isn't working?

# Generate example data ##

trial  - 1:100
result - list()

for (i in 1:3) {
 min - rnorm(max(trial),250,5)
 dif - rnorm(max(trial),500,5)
 p   - rnorm(max(trial),-.5,.1)
 e   - rnorm(max(trial),mean=0,sd=5)
 y   - min + dif*(trial)^p + e
 result[[i]] - data.frame(y,trial,id=i)
}
newdf   -do.call('rbind',result)
df.gr     - groupedData( y ~ trial | id, data=newdf)


### Single unit analysis

### The boundary condition on the dplt parameter is enforced! ..

df.one - subset(df.gr,id==1)
nls(y~SSpowrDplt(trial,min,dplt,dif,p),data=df.one,algorithm=port,lower=c(0.0,0.0,0.0,-10))

.. example output...
Nonlinear regression model
  model:  y ~ SSpowrDplt(trial, min, dplt, dif, p)
  data:  df.one
    min    dplt     dif       p
247.052   0.000 491.965  -0.462
 residual sum-of-squares: 234322
 Algorithm port, convergence message: relative convergence (4)
##

### nlsList analysis

### Boundary condition on dplt is not enforced
.

Lfit.nls    - 
nlsList(y~SSpowrDplt(trial,min,dplt,dif,p),data=df.gr,control=list(algorithm='port',lower=c(0.0,0.0,0.0,-10),maxiter=100))

.. example output...
Call:
  Model: y ~ SSpowrDplt(trial, min, dplt, dif, p) | id
   Data: df.gr

Coefficients:
   min
  Estimate Std. Error   t value     Pr(|t|)
1 276.2354   16.16609 17.087337 1.086442e-44
2 257.0127   20.43564 12.576694 3.390686e-30
3 206.4017   29.01315  7.114075 7.354863e-12
   dplt
     Estimate Std. Error    t value  Pr(|t|)
1 -0.06579982 0.03848086 -1.7099365 0.0951222
2 -0.01694362 0.04161933 -0.4071093 0.6786473
3  0.08981518 0.04636532  1.9371199 0.0528957
   dif
  Estimate Std. Error  t value     Pr(|t|)
1 477.5049   21.89002 21.81382 6.679439e-62
2 488.7171   22.11908 22.09482 4.466288e-66
3 552.7105   25.04206 22.07129 9.215344e-65
   p
    Estimate Std. Error   t value     Pr(|t|)
1 -0.5455936 0.06262040 -8.712713 7.615265e-16
2 -0.4839114 0.06074282 -7.966560 1.307734e-14
3 -0.4059903 0.05455864 -7.441355 9.297527e-13

Residual standard error: 27.43384 on 888 degrees of freedom
#

If you look at the structure of Lfit.nls, it looks like the control
arguments are passed correctly.
str(Lfit.nls)

List of 3
 $ 1:List of 6
 ..$ control    :List of 7
 .. ..$ maxiter  : num 100
 .. ..$ tol      : num 1e-05
 .. ..$ minFactor: num 0.000977
 .. ..$ printEval: logi FALSE
 .. ..$ warnOnly : logi FALSE
 .. ..$ algorithm: chr port
 .. ..$ lower    : num [1:4] 0 0 0 -10



If it helps, here's the selfStart function that I'm using
powrDpltInit -
function(mCall, LHS, data) {
  xy     - sortedXyData(mCall[[x]],LHS,data)
  min.s  - min(y)
  dif.s  - max(y)-min(y)
  dplt.s - 0.5
  p.s    - -.20
  value  - c(min.s, dplt.s, dif.s, p.s)
  names(value) - mCall[c(min,dplt,dif,p)]
  value
}

SSpowrDplt-selfStart(~min + dplt*x + dif*x^p,initial=powrDpltInit,
parameters=c(min,dplt,dif,p))



Thanks for your help!

Rick DeShon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] replace zeros in a block diagonal matrix with small random values?

2009-02-24 Thread Rick DeShon
Hi All.

Imagine you have a large block diagonal matrix.  I'd like to replace
the zeros in this matrix with small random (runif) numbers.  Any ideas
for a simple and efficient way to do this?

Best regards,

Rick DeShon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NLS plinear question

2008-05-06 Thread Rick DeShon
Hi All.

I've run into a problem with the plinear algorithm in nls that is confusing
me.

Assume the following reaction time data over 15 trials for a single unit.
Trials are coded from 0-14 so that the intercept represents reaction time in
the first trial.

trl  RT
 01132.0
 1 630.5
 21371.5
 3 704.0
 4 488.5
 5 575.5
 6 613.0
 7 824.5
 8 509.0
 9 791.0
10 492.5
11 515.5
12 467.0
13 556.5
14 456.0

Now fit a power function to this data using nls with the plinear algorithm
fit.pw  -nls(RT ~ cbind(1,trl, trl^p), start = c(p = -.2), algorithm =
plinear, data=df.one)

Yields the following error message
Error in numericDeriv(form[[3]], names(ind), env) :
   Missing value or an infinity produced when evaluating the model

Now, recode trial from 1-15 and run the same model.
fit.pw  -nls(RT ~ cbind(1,trl, trl^p), start = c(p = -.2), algorithm =
plinear, data=df.one)

Seems to work fine now...
Nonlinear regression model
  model:  RT ~ cbind(1, trl, trl^p)
   data:  df.one
 p  .lin1.lin.trl   .lin3
   -0.2845   200.3230-8.9467   904.7582
 residual sum-of-squares: 555915

Number of iterations to convergence: 11

Any idea why having a zero for the first value of X causes this problem?

Thanks in advance,

Rick DeShon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nlsList (nlme) error

2008-01-25 Thread Rick DeShon
Hi All.

I'm trying to run nlsList an getting an error that makes no sense to
me.  I have accuracy and reaction time data over many trials for each
person (id)

When I use nlsList code that is virtually identical to the example in
the doc file I get the following error.  I've tried everything I could
think of and can't get around it.  Any ideas what I'm doing wrong?
**
  fm.nls-nlsList(RT ~ SSasymp(trial, Asym, R0, lrc) ,
  data = df.group,
  na.action=na.exclude,
 start = c(Asym = 550, R0 = 770, rc =-2.5))

 Error in nlsList.formula(RT ~ SSasymp(trial, Asym, R0, lrc),
data = df.group,  :
  Data must be a groupedData object if formula does not include groups
***

Here's the first 10 records of my data:

 df.group[1:10,]
Grouped Data: RT ~ trial | id
  id trial ACC   RT block
1.1   1 1   1  1
2.1   2 1   0   NA 1
3.1   3 1   1 1309 1
4.1   4 1   1  544 1
5.1   5 1   1  654 1
6.1   6 1   0   NA 1
7.1   7 1   1  441 1
8.1   8 1   1  882 1
9.1   9 1   1 1097 1
10.1 10 1   1  898 1


The data are clearly grouped.  Just to be sure

str(df.group)
Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and
'data.frame':   15480 obs. of  5 variables:
 $ id   : Ord.factor w/ 43 levels 31830..: 41 11 33 22 4 27 5
2 37 19 ...
 $ trial: int  1 1 1 1 1 1 1 1 1 1 ...
 $ ACC  : int  1 0 1 1 1 0 1 1 1 1 ...
 $ RT   : int   NA 1309 544 654 NA 441 882 1097 898 ...
 $ block: int  1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, formula)=Class 'formula' length 3 RT ~ trial | id
  .. ..- attr(*, .Environment)=R_GlobalEnv
 - attr(*, FUN)=function (x)
 - attr(*, order.groups)= logi TRUE

Thanks!

Rick DeShon

 version
   _
platform   i386-apple-darwin8.10.1
arch   i386
os darwin8.10.1
system i386, darwin8.10.1
status
major  2
minor  6.1
year   2007
month  11
day26
svn rev43537
language   R
version.string R version 2.6.1 (2007-11-26)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Efficient computation of average covariance matrix over a list

2007-12-03 Thread Rick DeShon
Hi All.

I would like to compute a separate covariance matrix for a set of variables
for each of the levels of a factor and then compute the average covariance
matrix over the factor levels.  I can loop through this computation but I
need to perform the calculation for a large number of levels and am looking
for something more elegant.  To be concrete

u- 3
n- 10

x- rnorm((id*u))
y- rnorm((id*u))
z- rnorm((id*u))
id   - gl(u,n)

df   - data.frame(id,x,y,z)
df.s - split(xxx,id)

lcov - lapply(df.s,cov)
lcov

What's an efficient way to compute the average covariance matrix over the
list members in lcov?

Thanks in advance,

Rick

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Efficient computation of average covariance matrix over a list

2007-12-03 Thread Rick DeShon
Hi All.

I would like to compute a separate covariance matrix for a set of
variables for each of the levels of a factor and then compute the
average covariance matrix over the factor levels.  I can loop through
this computation but I need to perform the calculation for a large
number of levels and am looking for something more elegant.  To be
concrete

u- 3
n- 10

x- rnorm((id*u))
y- rnorm((id*u))
z- rnorm((id*u))
id   - gl(u,n)

df   - data.frame(id,x,y,z)
df.s - split(xxx,id)

lcov - lapply(df.s,cov)
lcov

What's an efficient way to compute the average covariance matrix over
the list members in lcov?

Thanks in advance,

Rick DeShon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.