Re: [R] simulation dichotomous data
Please remember the 'reply all' for the r-help page. First Question: How can i use Pearson correlation with dichotomous data? i want to use a correlation between dichotomous variables like spearman correlation in ordered categorical variables? cor(variable1, variable2, *method = pearson*) Second Question: Would like two separate populations (1000 samples, 10 var). Variables *within* datasets highly correlated, minimal correlation *between* datasets. As I have stated in a previous response, the code you have is sufficient. You can go through as many variables as you like *for each dataset* and induce correlations. You should do this for as many variables as you require to be correlated. As the code induces these correlations randomly, there should be *minimal* correlation between datasets but still some if the datasets have the same structure (same variables correlated within). If different variables are correlated within each, then the correlation between datasets would likely be lower. It is extremely unrealistic to believe that there will be absolutely no correlation between datasets so you must decide at which point you consider it sufficiently low. One final point, in the code section # subset variable to have a stronger correlation, you can only do one at a time or you must change the name of the second object otherwise you are just overwriting the previous 'v1'. You have described what you want to me and you have the code to do it. The major hurdle here would be an implementation of some 'for loops', which is not terribly complex if you are working on your programming. However, they are not necessary if you just want to write several lines with new object names for each variable in each dataset. Give it a try, you know how to induce correlations now. Just chose which variables to correlate and do it for all of those for each dataset and compare. Regards, Dr. Charles Determan On Thu, Jul 31, 2014 at 9:10 AM, thanoon younis thanoon.youni...@gmail.com wrote: Many thanks to you firstly : how can i use Pearson correlation with dichotomous data? i want to use a correlation between dichotomous variables like spearman correlation in ordered categorical variables. secondly: i have two different population and each population has 1000 samples and 10 var. so i want to put a high correlation coefficient between variables in the first population and also put a high correlation coefficient between variables in the second population and no correlation between two populations because i want to use multiple group structural equation models. many thanks again Thanoon On 31 July 2014 16:45, Charles Determan Jr deter...@umn.edu wrote: Thanoon, You should still send the question to the R help list even when I helped you with the code you are currently using. I will not always know the best way or even how to proceed with some questions. As for to your question with the code below. Firstly, there is no 'phi' method for cor in base R. If you are using it, you must have neglected to include a package you are using. However, given that the phi coefficient is equal to the pearson coefficient for dichotomous data, you can use the 'pearson' method. Secondly, with respect to your primary concern. In this case, we have randomly chosen variables to correlate between two INDEPENDENT DATASETS (i.e. different groups of samples). The idea with this code is that R1 and R2 are datasets of 1000 samples and 10 variables. It would be miraculous if they correlated when each had variables randomly assigned as correlated. The code work correctly, the question now becomes if you want to see correlations across variables for all samples (which this does for each DATASET) or if you want two DATASETS to be correlated. ords - seq(0,1) p - 10 N - 1000 percent_change - 0.9 R1 - as.data.frame(replicate(p, sample(ords, N, replace = T))) R2 - as.data.frame(replicate(p, sample(ords, N, replace = T))) # phi is more appropriate for dichotomous data cor(R1, method = phi) cor(R2, method = phi) # subset variable to have a stronger correlation v1 - R1[,1, drop = FALSE] v1 - R2[,1, drop = FALSE] # randomly choose which rows to retain keep - sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1)) change - as.numeric(rownames(v1)[-keep]) # randomly choose new values for changing new.change - sample(ords, ((1-percent_change)*N)+1, replace = T) # replace values in copy of original column v1.samp - v1 v1.samp[change,] - new.change # closer correlation cor(v1, v1.samp, method = phi) # set correlated column as one of your other columns R1[,2] - v1.samp R2[,2] - v1.samp R1 R2 On Thu, Jul 31, 2014 at 7:29 AM, thanoon younis thanoon.youni...@gmail.com wrote: dear Dr. Charles i have a problem with the following R - program in simulation data with 2 different samples and with high correlation between variables in each sample so when i applied
Re: [R] simulation dichotomous data
Thanoon, You should still send the question to the R help list even when I helped you with the code you are currently using. I will not always know the best way or even how to proceed with some questions. As for to your question with the code below. Firstly, there is no 'phi' method for cor in base R. If you are using it, you must have neglected to include a package you are using. However, given that the phi coefficient is equal to the pearson coefficient for dichotomous data, you can use the 'pearson' method. Secondly, with respect to your primary concern. In this case, we have randomly chosen variables to correlate between two INDEPENDENT DATASETS (i.e. different groups of samples). The idea with this code is that R1 and R2 are datasets of 1000 samples and 10 variables. It would be miraculous if they correlated when each had variables randomly assigned as correlated. The code work correctly, the question now becomes if you want to see correlations across variables for all samples (which this does for each DATASET) or if you want two DATASETS to be correlated. ords - seq(0,1) p - 10 N - 1000 percent_change - 0.9 R1 - as.data.frame(replicate(p, sample(ords, N, replace = T))) R2 - as.data.frame(replicate(p, sample(ords, N, replace = T))) # phi is more appropriate for dichotomous data cor(R1, method = phi) cor(R2, method = phi) # subset variable to have a stronger correlation v1 - R1[,1, drop = FALSE] v1 - R2[,1, drop = FALSE] # randomly choose which rows to retain keep - sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1)) change - as.numeric(rownames(v1)[-keep]) # randomly choose new values for changing new.change - sample(ords, ((1-percent_change)*N)+1, replace = T) # replace values in copy of original column v1.samp - v1 v1.samp[change,] - new.change # closer correlation cor(v1, v1.samp, method = phi) # set correlated column as one of your other columns R1[,2] - v1.samp R2[,2] - v1.samp R1 R2 On Thu, Jul 31, 2014 at 7:29 AM, thanoon younis thanoon.youni...@gmail.com wrote: dear Dr. Charles i have a problem with the following R - program in simulation data with 2 different samples and with high correlation between variables in each sample so when i applied the program i got on a results but without correlation between each sample. i appreciate your help and your time i did not send this code to R- help because you helped me before to write it . many thanks to you Thanoon -- Dr. Charles Determan, PhD Integrated Biosciences [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation over data repeatedly for four loops
Perhaps you might want to abstract your code a bit and try something like: X = rnorm(500) # Some Data replicate(1e4, mean(sample(X, 500, replace = T))) Obviously you can set up a loop over your data sets as needed. Michael On Sat, Nov 12, 2011 at 6:46 PM, Francesca francesca.panco...@gmail.com wrote: Dear Contributors, I am trying to perform a simulation over sample data, but I need to reproduce the same simulation over 4 groups of data. My ability with for loop is null, in particular related to dimensions as I always get, no matter what I try, number of items to replace is not a multiple of replacement length This is what I intend to do: replicate this operation for four times, where the index for the four groups is in the part of the code: datiPc[[1]][,2]. I have to replicate the following code 4 times, where the changing part is in the data from which I pick the sample, the data that are stored in datiPc[[1]][,2]. If I had to use data for the four samples, I would substitute the 1 with a j and replicate a loop four times, but it never worked. My desired final outcome is a matrix with 1 observations for each couple of extracted samples, i.e. 8 columns of 1 observations of means. db-c() # Estrazione dei campioni dai dati di PGG e TRUST estr1 - c(); estr2 - c(); m1-c() m2-c() tmp1- data1[[1]][,2]; tmp2- data2[[2]][,2]; for(i in 1:100){ estr1-sample(tmp1, 1000, replace = TRUE) estr2-sample(tmp2, 1000, replace = TRUE) m1[i]-mean(estr1,na.rm=TRUE) m2[i]-mean(estr2,na.rm=TRUE) } db-data.frame(cbind(m1,m2)) Thanks for any help you can provide. Best Regards -- Francesca -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulation over data repeatedly for four loops
Dear Contributors, I am trying to perform a simulation over sample data, but I need to reproduce the same simulation over 4 groups of data. My ability with for loop is null, in particular related to dimensions as I always get, no matter what I try, number of items to replace is not a multiple of replacement length This is what I intend to do: replicate this operation for four times, where the index for the four groups is in the part of the code: datiPc[[1]][,2]. I have to replicate the following code 4 times, where the changing part is in the data from which I pick the sample, the data that are stored in datiPc[[1]][,2]. If I had to use data for the four samples, I would substitute the 1 with a j and replicate a loop four times, but it never worked. My desired final outcome is a matrix with 1 observations for each couple of extracted samples, i.e. 8 columns of 1 observations of means. db-c() # Estrazione dei campioni dai dati di PGG e TRUST estr1 - c(); estr2 - c(); m1-c() m2-c() tmp1- data1[[1]][,2]; tmp2- data2[[2]][,2]; for(i in 1:100){ estr1-sample(tmp1, 1000, replace = TRUE) estr2-sample(tmp2, 1000, replace = TRUE) m1[i]-mean(estr1,na.rm=TRUE) m2[i]-mean(estr2,na.rm=TRUE) } db-data.frame(cbind(m1,m2)) Thanks for any help you can provide. Best Regards -- Francesca -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation of data
2008/10/21 Marcioestat [EMAIL PROTECTED]: Hi listers, I am working on a program of statistical analysis of simulated data and I've been searching the error at the program, but I didn't find it! It is something about the WHILE procedure, the error says: Error in while (ecart = d) { : missing value where TRUE/FALSE needed How much do you know about debugging programs? Read the R help for debug and browser. These tools let you inspect your program at any point so you can see what the values of variables are. Anyway, for you error message, I'd guess there was a missing value in 'ecart' or 'd' or both: ecart [1] NaN d [1] 0.00112 so, ecart is Not A Number. How could that happen? Well, ecart is a square root of something, so maybe that something is negative: proportion*(1-proportion) [1] -0.01874365 Hmmm proportion [1] 1.018405 I would bet that proportion isn't supposed to be more than 1 (or less than 0). How did that happen? This line looks a bit dodgy: proportion-proportion+(prop[k+1]-proportion/(k+1)) since you're adding something to a proportion... Should that division be outside the parentheses? Hard to tell without knowing exactly what the code is trying to do. The proportion goes 1 when w is over your 1.75 threshold. Are you just trying to update proportion=mean(prop) again? Why not do that? If I replace your line with proportion = mean(prop) then it runs and terminates after about 32000 iterations. Protip: Updating a mean like this is probably quicker but if you ever write a tricky bit of code test it first! Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation of data
Hi Barry, As you explained I find out my mistake... I wasn't supposed to use a recursive formula! So, what I needed, it's just to use the proportion by the function mean to reach my results... Thanks, Márcio Barry Rowlingson wrote: 2008/10/21 Marcioestat [EMAIL PROTECTED]: Hi listers, I am working on a program of statistical analysis of simulated data and I've been searching the error at the program, but I didn't find it! It is something about the WHILE procedure, the error says: Error in while (ecart = d) { : missing value where TRUE/FALSE needed How much do you know about debugging programs? Read the R help for debug and browser. These tools let you inspect your program at any point so you can see what the values of variables are. Anyway, for you error message, I'd guess there was a missing value in 'ecart' or 'd' or both: ecart [1] NaN d [1] 0.00112 so, ecart is Not A Number. How could that happen? Well, ecart is a square root of something, so maybe that something is negative: proportion*(1-proportion) [1] -0.01874365 Hmmm proportion [1] 1.018405 I would bet that proportion isn't supposed to be more than 1 (or less than 0). How did that happen? This line looks a bit dodgy: proportion-proportion+(prop[k+1]-proportion/(k+1)) since you're adding something to a proportion... Should that division be outside the parentheses? Hard to tell without knowing exactly what the code is trying to do. The proportion goes 1 when w is over your 1.75 threshold. Are you just trying to update proportion=mean(prop) again? Why not do that? If I replace your line with proportion = mean(prop) then it runs and terminates after about 32000 iterations. Protip: Updating a mean like this is probably quicker but if you ever write a tricky bit of code test it first! Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Simulation-of-data-tp20082754p20093904.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulation of data
Hi listers, I am working on a program of statistical analysis of simulated data and I've been searching the error at the program, but I didn't find it! It is something about the WHILE procedure, the error says: Error in while (ecart = d) { : missing value where TRUE/FALSE needed Thanks in advance! Márcio k-100 d-0.00112 z-rnorm(100, 0, 1) prop-rep(0,100) for (i in 1:100){ if (z[i]1.75){ prop[i]-1} else{prop[i]-0}} proportion-mean(prop) ecart-sqrt((proportion*(1-proportion))/k) while(ecart=d){ prop_-0 w- rnorm(1, 0, 1) z-c(z,w) {if (w1.75){ prop_-1} else{ prop_-0} } prop-c(prop, prop_) proportion-proportion+(prop[k+1]-proportion/(k+1)) ecart-sqrt((proportion*(1-proportion))/(k+1)) k-k+1 } -- View this message in context: http://www.nabble.com/Simulation-of-data-tp20082754p20082754.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.