Dear Group, I am trying to simulate a dataset with 200 individuals with random assignment of Sex (1,0) and Weight from lognormal distribution specific to Sex. I am intrigued by the behavior of rlnorm function to impute a value of Weight from the specified distribution. Here is the code: ID<-1:200 Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6)) fulldata<-data.frame(ID,Sex) fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog = sqrt(0.0329)), rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)))
mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73 mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85 I see that the number of simulated values has an effect on the mean calculated after imputation. That is, the code rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)) gives much better match compared to rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in the code above. My understanding is that ifelse will be imputing only one value where the condition is met as specified. I appreciate your insights on the behavior for better performance of increasing sample number. I appreciate your comments. Regards, Ayyappa [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.