Re: [R] simulation dichotomous data

2014-08-01 Thread Charles Determan Jr
Please remember the 'reply all' for the r-help page.

First Question: How can i use Pearson correlation with dichotomous data? i
want to use a correlation between dichotomous variables like spearman
correlation in ordered categorical variables?

cor(variable1, variable2, *method = pearson*)

Second Question: Would like two separate populations (1000 samples, 10
var).  Variables *within* datasets highly correlated, minimal correlation
*between* datasets.

As I have stated in a previous response, the code you have is sufficient.
You can go through as many variables as you like *for each dataset* and
induce correlations.  You should do this for as many variables as you
require to be correlated.  As the code induces these correlations randomly,
there should be *minimal* correlation between datasets but still some if
the datasets have the same structure (same variables correlated within).
If different variables are correlated within each, then the correlation
between datasets would likely be lower.  It is extremely unrealistic to
believe that there will be absolutely no correlation between datasets so
you must decide at which point you consider it sufficiently low.

One final point, in the code section # subset variable to have a stronger
correlation, you can only do one at a time or you must change the name of
the second object otherwise you are just overwriting the previous 'v1'.

You have described what you want to me and you have the code to do it.  The
major hurdle here would be an implementation of some 'for loops', which is
not terribly complex if you are working on your programming.  However, they
are not necessary if you just want to write several lines with new object
names for each variable in each dataset.  Give it a try, you know how to
induce correlations now.  Just chose which variables to correlate and do it
for all of those for each dataset and compare.

Regards,
Dr. Charles Determan


On Thu, Jul 31, 2014 at 9:10 AM, thanoon younis thanoon.youni...@gmail.com
wrote:

 Many thanks to you

 firstly : how can i use Pearson correlation with dichotomous data? i want
 to use a correlation between dichotomous variables like spearman
 correlation in ordered categorical variables.

 secondly: i have two different population and each population has 1000
 samples and 10 var. so i want to put a high correlation coefficient between
 variables in the  first population and also put a high correlation
 coefficient between variables in the  second population and no correlation
 between two populations because i want to use multiple group structural
 equation models.


 many thanks again

 Thanoon




 On 31 July 2014 16:45, Charles Determan Jr deter...@umn.edu wrote:

 Thanoon,

 You should still send the question to the R help list even when I helped
 you with the code you are currently using.  I will not always know the best
 way or even how to proceed with some questions.  As for to your question
 with the code below.

 Firstly, there is no 'phi' method for cor in base R.  If you are using
 it, you must have neglected to include a package you are using.  However,
 given that the phi coefficient is equal to the pearson coefficient for
 dichotomous data, you can use the 'pearson' method.

 Secondly, with respect to your primary concern.  In this case, we have
 randomly chosen variables to correlate between two INDEPENDENT DATASETS
 (i.e. different groups of samples).  The idea with this code is that R1 and
 R2 are datasets of 1000 samples and 10 variables.  It would be miraculous
 if they correlated when each had variables randomly assigned as
 correlated.  The code work correctly, the question now becomes if you want
 to see correlations across variables for all samples (which this does for
 each DATASET) or if you want two DATASETS to be correlated.

 ords - seq(0,1)
 p - 10
 N - 1000
 percent_change - 0.9

 R1 - as.data.frame(replicate(p, sample(ords, N, replace = T)))
 R2 - as.data.frame(replicate(p, sample(ords, N, replace = T)))

 # phi is more appropriate for dichotomous data
 cor(R1, method = phi)
 cor(R2, method = phi)

 # subset variable to have a stronger correlation
 v1 - R1[,1, drop = FALSE]
 v1 - R2[,1, drop = FALSE]

 # randomly choose which rows to retain
 keep - sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1))
 change - as.numeric(rownames(v1)[-keep])

 # randomly choose new values for changing
 new.change - sample(ords, ((1-percent_change)*N)+1, replace = T)

 # replace values in copy of original column
 v1.samp - v1
 v1.samp[change,] - new.change

 # closer correlation
 cor(v1, v1.samp, method = phi)

 # set correlated column as one of your other columns
 R1[,2] - v1.samp
 R2[,2] - v1.samp
 R1
 R2


 On Thu, Jul 31, 2014 at 7:29 AM, thanoon younis 
 thanoon.youni...@gmail.com wrote:

 dear Dr. Charles
 i have a problem with the following R - program in simulation data with
 2 different samples and with high correlation between variables in each
 sample so when i applied

Re: [R] simulation dichotomous data

2014-07-31 Thread Charles Determan Jr
Thanoon,

You should still send the question to the R help list even when I helped
you with the code you are currently using.  I will not always know the best
way or even how to proceed with some questions.  As for to your question
with the code below.

Firstly, there is no 'phi' method for cor in base R.  If you are using it,
you must have neglected to include a package you are using.  However, given
that the phi coefficient is equal to the pearson coefficient for
dichotomous data, you can use the 'pearson' method.

Secondly, with respect to your primary concern.  In this case, we have
randomly chosen variables to correlate between two INDEPENDENT DATASETS
(i.e. different groups of samples).  The idea with this code is that R1 and
R2 are datasets of 1000 samples and 10 variables.  It would be miraculous
if they correlated when each had variables randomly assigned as
correlated.  The code work correctly, the question now becomes if you want
to see correlations across variables for all samples (which this does for
each DATASET) or if you want two DATASETS to be correlated.

ords - seq(0,1)
p - 10
N - 1000
percent_change - 0.9

R1 - as.data.frame(replicate(p, sample(ords, N, replace = T)))
R2 - as.data.frame(replicate(p, sample(ords, N, replace = T)))

# phi is more appropriate for dichotomous data
cor(R1, method = phi)
cor(R2, method = phi)

# subset variable to have a stronger correlation
v1 - R1[,1, drop = FALSE]
v1 - R2[,1, drop = FALSE]

# randomly choose which rows to retain
keep - sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1))
change - as.numeric(rownames(v1)[-keep])

# randomly choose new values for changing
new.change - sample(ords, ((1-percent_change)*N)+1, replace = T)

# replace values in copy of original column
v1.samp - v1
v1.samp[change,] - new.change

# closer correlation
cor(v1, v1.samp, method = phi)

# set correlated column as one of your other columns
R1[,2] - v1.samp
R2[,2] - v1.samp
R1
R2


On Thu, Jul 31, 2014 at 7:29 AM, thanoon younis thanoon.youni...@gmail.com
wrote:

 dear Dr. Charles
 i have a problem with the following R - program in simulation data with 2
 different samples and with high correlation between variables in each
 sample so when i applied the program i got on a results but without
 correlation between each sample.
 i appreciate your help and your time
 i did not send this code to R- help because you helped me before to write
 it .

 many thanks to you

 Thanoon




-- 
Dr. Charles Determan, PhD
Integrated Biosciences

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation over data repeatedly for four loops

2011-11-13 Thread R. Michael Weylandt
Perhaps you might want to abstract your code a bit and try something like:

X = rnorm(500) # Some Data
replicate(1e4, mean(sample(X, 500, replace = T)))

Obviously you can set up a loop over your data sets as needed.

Michael

On Sat, Nov 12, 2011 at 6:46 PM, Francesca francesca.panco...@gmail.com wrote:
 Dear Contributors,

 I am trying to perform a simulation over sample data,

 but I need to reproduce the same simulation over 4 groups of data. My
 ability with for loop is null, in particular related

 to dimensions as I always get, no matter what I try,

 number of items to replace is not a multiple of replacement length


 This is what I intend to do: replicate this operation for

 four times, where the index for the four groups is in the

 part of the code: datiPc[[1]][,2].

 I have to replicate the following code 4 times, where the

 changing part is in the data from which I pick the sample,

 the data that are stored in datiPc[[1]][,2].

 If I had to use data for the four samples, I would substitute the 1 with a
 j and replicate a loop four times, but it never worked.


 My desired final outcome is a matrix with 1 observations for each
 couple of extracted samples, i.e. 8 columns of 1 observations of means.



 db-c()

 # Estrazione dei campioni dai dati di PGG e TRUST

 estr1 - c();

    estr2 - c();

    m1-c()

    m2-c()

       tmp1- data1[[1]][,2];

      tmp2- data2[[2]][,2];

        for(i in 1:100){

 estr1-sample(tmp1, 1000, replace = TRUE)

        estr2-sample(tmp2, 1000, replace = TRUE)


        m1[i]-mean(estr1,na.rm=TRUE)

        m2[i]-mean(estr2,na.rm=TRUE)

 }

 db-data.frame(cbind(m1,m2))
 Thanks for any help you can provide.
 Best Regards

 --

 Francesca
 --

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simulation over data repeatedly for four loops

2011-11-12 Thread Francesca
Dear Contributors,

I am trying to perform a simulation over sample data,

but I need to reproduce the same simulation over 4 groups of data. My
ability with for loop is null, in particular related

to dimensions as I always get, no matter what I try,

number of items to replace is not a multiple of replacement length


This is what I intend to do: replicate this operation for

four times, where the index for the four groups is in the

part of the code: datiPc[[1]][,2].

I have to replicate the following code 4 times, where the

changing part is in the data from which I pick the sample,

the data that are stored in datiPc[[1]][,2].

If I had to use data for the four samples, I would substitute the 1 with a
j and replicate a loop four times, but it never worked.


My desired final outcome is a matrix with 1 observations for each
couple of extracted samples, i.e. 8 columns of 1 observations of means.



db-c()

# Estrazione dei campioni dai dati di PGG e TRUST

estr1 - c();

estr2 - c();

m1-c()

m2-c()

   tmp1- data1[[1]][,2];

  tmp2- data2[[2]][,2];

for(i in 1:100){

estr1-sample(tmp1, 1000, replace = TRUE)

estr2-sample(tmp2, 1000, replace = TRUE)


m1[i]-mean(estr1,na.rm=TRUE)

m2[i]-mean(estr2,na.rm=TRUE)

}

db-data.frame(cbind(m1,m2))
Thanks for any help you can provide.
Best Regards

-- 

Francesca
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation of data

2008-10-21 Thread Barry Rowlingson
2008/10/21 Marcioestat [EMAIL PROTECTED]:

 Hi listers,
 I am working on a program of statistical analysis of simulated data and I've
 been searching the error at the program, but I didn't find it!
 It is something about the WHILE procedure, the error says: Error in while
 (ecart = d) { : missing value where TRUE/FALSE needed

 How much do you know about debugging programs? Read the R help for
debug and browser. These tools let you inspect your program at any
point so you can see what the values of variables are.

 Anyway, for you error message, I'd guess there was a missing value in
'ecart' or 'd' or both:

  ecart
 [1] NaN
  d
 [1] 0.00112

 so, ecart is Not A Number. How could that happen? Well, ecart is a
square root of something, so maybe that something is negative:

  proportion*(1-proportion)
[1] -0.01874365

 Hmmm

  proportion
 [1] 1.018405

 I would bet that proportion isn't supposed to be more than 1 (or less
than 0). How did that happen?

 This line looks a bit dodgy:

  proportion-proportion+(prop[k+1]-proportion/(k+1))

since you're adding something to a proportion... Should that division
be outside the parentheses? Hard to tell without knowing exactly what
the code is trying to do. The proportion goes 1 when w is over your
1.75 threshold. Are you just trying to update proportion=mean(prop)
again? Why not do that? If I replace your line with proportion =
mean(prop) then it runs and terminates after about 32000 iterations.

 Protip: Updating a mean like this is probably quicker but if you ever
write a tricky bit of code test it first!

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation of data

2008-10-21 Thread Marcioestat

Hi Barry,
As you explained I find out my mistake... I wasn't supposed to use a
recursive formula!
So, what I needed, it's just to use the proportion by the function mean to
reach my results...
Thanks,
Márcio


Barry Rowlingson wrote:
 
 2008/10/21 Marcioestat [EMAIL PROTECTED]:

 Hi listers,
 I am working on a program of statistical analysis of simulated data and
 I've
 been searching the error at the program, but I didn't find it!
 It is something about the WHILE procedure, the error says: Error in while
 (ecart = d) { : missing value where TRUE/FALSE needed
 
  How much do you know about debugging programs? Read the R help for
 debug and browser. These tools let you inspect your program at any
 point so you can see what the values of variables are.
 
  Anyway, for you error message, I'd guess there was a missing value in
 'ecart' or 'd' or both:
 
   ecart
  [1] NaN
   d
  [1] 0.00112
 
  so, ecart is Not A Number. How could that happen? Well, ecart is a
 square root of something, so maybe that something is negative:
 
   proportion*(1-proportion)
 [1] -0.01874365
 
  Hmmm
 
   proportion
  [1] 1.018405
 
  I would bet that proportion isn't supposed to be more than 1 (or less
 than 0). How did that happen?
 
  This line looks a bit dodgy:
 
   proportion-proportion+(prop[k+1]-proportion/(k+1))
 
 since you're adding something to a proportion... Should that division
 be outside the parentheses? Hard to tell without knowing exactly what
 the code is trying to do. The proportion goes 1 when w is over your
 1.75 threshold. Are you just trying to update proportion=mean(prop)
 again? Why not do that? If I replace your line with proportion =
 mean(prop) then it runs and terminates after about 32000 iterations.
 
  Protip: Updating a mean like this is probably quicker but if you ever
 write a tricky bit of code test it first!
 
 Barry
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Simulation-of-data-tp20082754p20093904.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simulation of data

2008-10-20 Thread Marcioestat

Hi listers,
I am working on a program of statistical analysis of simulated data and I've
been searching the error at the program, but I didn't find it!
It is something about the WHILE procedure, the error says: Error in while
(ecart = d) { : missing value where TRUE/FALSE needed
Thanks in advance!
Márcio

k-100
d-0.00112
z-rnorm(100, 0, 1)
prop-rep(0,100)
for (i in 1:100){
if (z[i]1.75){
prop[i]-1}
else{prop[i]-0}}
proportion-mean(prop)
ecart-sqrt((proportion*(1-proportion))/k)
while(ecart=d){
prop_-0
w- rnorm(1, 0, 1)
z-c(z,w)
{if (w1.75){
prop_-1}
else{
prop_-0}
}
prop-c(prop, prop_)
proportion-proportion+(prop[k+1]-proportion/(k+1))
ecart-sqrt((proportion*(1-proportion))/(k+1))
k-k+1
}

-- 
View this message in context: 
http://www.nabble.com/Simulation-of-data-tp20082754p20082754.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.