[R] better example for multivariate data simulation question-please help if you can

Andras Farkas Fri, 12 Oct 2012 11:48:52 -0700

Dear All,
 
a few weeks ago I have posted a question on the R help listserv that some of 
you have responded to with a great solution, would like to thank you for that  
again. I thought I would reach out to you with the issue I am trying to solve 
now. I have posted the question a few days ago, but probably it was not clear 
enough, so I thought i try it again. At times I have a multivariate example on 
my hand with known information of means, SDs and medians for the variables, and 
the covariance matrix of those variables. Occasionally, these parameters have a 
strong enough relationship between them that a covariance matrix can be 
established. Please see attached document as an example. Usually when I (a 
medicine people) simulate (and it is not to say that this is the best 
approach), we use a lognormal distribution to avoid from negative values being 
generated because physiologic variables almost are never negative (we also 
really do not know better,
 unfortunatelly). For the most part I use another software that is capable of 
reproducing reasonable means and medians and SD if I enter the covariance 
matrix, but that is not a free resource (so I can not share the solutions with 
others), nor does it have the Sweave option for standard reports like R does 
that can be distributed for free. Unfortunately in R I am having a hard time 
figuring the solution out. I have tried to use the multivariate normal 
distribution function mvrnorm from the MASS package, or the Mvnorm from mvtnorm 
package, but will get negative values simulated, which I can not afford, also, 
at times the simulated means, medians and SDs are quiet different from what I 
started with (which may be due to the assumption I make with regards to the 
distribution of the data). I was wondering if anyone would be willing to 
provide some thoughts on how you think one should try to attempt to simulate in 
R a multivariate distribution
 with covariance matrix (using the attached data as an example) that would 
result in reasonable means, medians and SD as compared to the original values? 
While to have a better idea about the actual distribution of the data would 
probably be invaluable to accurately reproduce the data (and to choose a 
probability distribution to simulate with), often times in the medical 
literature we only have information available similar to what I have attached, 
(and we make the assumption of it being log normally distributed as I have 
mentioned it above). I would greatly appreciate your help,
 
Sincerely,
 
Andras

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] better example for multivariate data simulation question-please help if you can

Reply via email to