[See at end] On 21-Feb-11 17:13:03, Larry Hotchkiss wrote: > Hi list: > Here is one approach that will generate two uniformly distributed > variables with a correlation of 0.5 -- > [snip] >> > -----Urspr?ngliche Nachricht----- >> > Von:r-help-boun...@r-project.org >> > [mailto:r-help-boun...@r-project.org] Im Auftrag von S??ren Faurby >> > Gesendet: Sonntag, 20. Februar 2011 03:18 >> > An:r-help@r-project.org >> > Betreff: [R] Generating uniformly distributed correlated data. >> > >> > I wish to generate a vector of uniformly distributed data >> > with a defined correlation to another vector >> > >> > The only function I have been able to find doing something >> > similar is corgen from the library ecodist. >> > >> > The following code generates data with the desired >> > correlation to the vector x but the resulting vector y is >> > normal and not uniform distributed >> > >> > library(ecodist) >> > x<- runif(105) >> > y<- corgen(x=x, r=.5)$y >> > >> > Do anyone know a similar function generating uniform >> > distributed data or a way of transforming y to the desired >> > distribution while keeping the correlation between x and y >> > >> > Kind regards, Soren
I have got intrigued by this problem! In his original query, Søren Faurby did not specify the original vector 'x' for which he wanted to generate uniformly distributed 'y' with a given correlation 'r': "I wish to generate a vector of uniformly distributed data with a defined correlation to another vector" In particular he did not state whether it should have any specified distriobution -- he only specified that 'y' should be uniformly distributed. So I'm addressing the general problem in which 'x' may have an arbitrary distribution (or be an arbitrary given vector), and y is to have a uniform distribution and have a given correlation 'r' with 'x'. In the following, I assume r>0. I have an idea for an algorithm, though not sure how best to write a general function for it. It is based on the idea that you first sort 'x'; then generate 'y' as runif(length(x)) and sort 'y'. This generates a vector 'y' with maximal correlation with 'x'. Now, you start at the two extreme ends of 'y', and swap those two elements of 'y'. If the result still exceeds the desired correlation 'x', you move in and swap again, and so on. If you drop below the desired correlation, then you back-track, and try swapping a pair closer to the middle. And so on. Once you get really close, but still just above 'r', you switch to working out from the middle to fine-tune the result. The basic idea is to permute the elements of 'y' to get as close as possible to 'r'. But one needs to avoid an exhaustive search through all permutations! The following is an example, with length(x)=100 and 'x' Normally distributed, and the desired correlation r=0.5, was done by hand in about 20 minutes. Only the highlights are presented. set.seed(54321) X0 <- rnorm(100) ; Y0 <- runif(100) r <- 0.5 X1 <- sort(X0) ; Y1 <- sort(Y0) # First try (goes a bit too far): X <- X1 ; Y <- Y1 N <- length(Y) T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 T <- Y[3] ; Y[3] <- Y[N-2] ; Y[N-2] <- T # cor(X,Y) = 0.4803957 # So back up and skip the (3<->98) swap) X <- X1 ; Y <- Y1 T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 T <- Y[5] ; Y[5] <- Y[N-4] ; Y[N-4] <- T # cor(X,Y) = 0.400471 # Following a few tries at (20,81), (30,71), (40,61): X <- X1 ; Y <- Y1 T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 T <- Y[45]; Y[45]<- Y[N-44]; Y[N-44]<- T # cor(X,Y) = 0.5017024 # Now grope more finely: X <- X1 ; Y <- Y1 T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 T <- Y[43]; Y[43]<- Y[N-42]; Y[N-42]<- T # cor(X,Y) = 0.5001594 # And again: X <- X1 ; Y <- Y1 T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 T<- Y[43] ; Y[43]<- Y[N-42]; Y[N-42]<- T # cor(X,Y) = 0.5001594 T<- Y[49] ; Y[49]<- Y[N-48]; Y[N-48]<- T # cor(X,Y) = 0.500079 T<- Y[50] ; Y[50]<- Y[N-49]; Y[N-49]<- T # cor(X,Y) = 0.5000634 ### And this is probably as close as one can get. Pretty close! Now one can plot(X,Y). Maybe the resulting cross-shape (more or less inevitable from such a procedure) is undesirable. But at least the desiderata stated in the query have been met. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 21-Feb-11 Time: 21:38:06 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.