[OOPS! The example I gave before does not correspond to the stated seed. At the end is a revised example where I have checked that it does correspond. Sorry. ]
On 21-Feb-11 21:38:11, Ted Harding wrote: > [See at end] > > On 21-Feb-11 17:13:03, Larry Hotchkiss wrote: >> Hi list: >> Here is one approach that will generate two uniformly distributed >> variables with a correlation of 0.5 -- >> [snip] >>> > -----Urspr?ngliche Nachricht----- >>> > Von:r-help-boun...@r-project.org >>> > [mailto:r-help-boun...@r-project.org] Im Auftrag von S??ren Faurby >>> > Gesendet: Sonntag, 20. Februar 2011 03:18 >>> > An:r-help@r-project.org >>> > Betreff: [R] Generating uniformly distributed correlated data. >>> > >>> > I wish to generate a vector of uniformly distributed data >>> > with a defined correlation to another vector >>> > >>> > The only function I have been able to find doing something >>> > similar is corgen from the library ecodist. >>> > >>> > The following code generates data with the desired >>> > correlation to the vector x but the resulting vector y is >>> > normal and not uniform distributed >>> > >>> > library(ecodist) >>> > x<- runif(105) >>> > y<- corgen(x=x, r=.5)$y >>> > >>> > Do anyone know a similar function generating uniform >>> > distributed data or a way of transforming y to the desired >>> > distribution while keeping the correlation between x and y >>> > >>> > Kind regards, Soren > > I have got intrigued by this problem! In his original query, > Søren Faurby did not specify the original vector 'x' for > which he wanted to generate uniformly distributed 'y' with > a given correlation 'r': > > "I wish to generate a vector of uniformly distributed > data with a defined correlation to another vector" > > In particular he did not state whether it should have any > specified distriobution -- he only specified that 'y' should > be uniformly distributed. > > So I'm addressing the general problem in which 'x' may have > an arbitrary distribution (or be an arbitrary given vector), > and y is to have a uniform distribution and have a given > correlation 'r' with 'x'. In the following, I assume r>0. > > I have an idea for an algorithm, though not sure how best to > write a general function for it. It is based on the idea that > you first sort 'x'; then generate 'y' as runif(length(x)) > and sort 'y'. This generates a vector 'y' with maximal > correlation with 'x'. Now, you start at the two extreme ends > of 'y', and swap those two elements of 'y'. If the result > still exceeds the desired correlation 'x', you move in > and swap again, and so on. If you drop below the desired > correlation, then you back-track, and try swapping a pair > closer to the middle. And so on. Once you get really close, > but still just above 'r', you switch to working out from > the middle to fine-tune the result. > > The basic idea is to permute the elements of 'y' to get > as close as possible to 'r'. But one needs to avoid an > exhaustive search through all permutations! > > The following is an example, with length(x)=100 and 'x' > Normally distributed, and the desired correlation r=0.5, > was done by hand in about 20 minutes. Only the highlights > are presented. > > set.seed(54321) > X0 <- rnorm(100) ; Y0 <- runif(100) > r <- 0.5 > > X1 <- sort(X0) ; Y1 <- sort(Y0) > ># First try (goes a bit too far): > X <- X1 ; Y <- Y1 > N <- length(Y) > T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 > T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 > T <- Y[3] ; Y[3] <- Y[N-2] ; Y[N-2] <- T # cor(X,Y) = 0.4803957 > ># So back up and skip the (3<->98) swap) > X <- X1 ; Y <- Y1 > T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 > T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 > T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 > T <- Y[5] ; Y[5] <- Y[N-4] ; Y[N-4] <- T # cor(X,Y) = 0.400471 > ># Following a few tries at (20,81), (30,71), (40,61): > X <- X1 ; Y <- Y1 > T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 > T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 > T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 > T <- Y[45]; Y[45]<- Y[N-44]; Y[N-44]<- T # cor(X,Y) = 0.5017024 > ># Now grope more finely: > X <- X1 ; Y <- Y1 > T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 > T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 > T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 > T <- Y[43]; Y[43]<- Y[N-42]; Y[N-42]<- T # cor(X,Y) = 0.5001594 > ># And again: > X <- X1 ; Y <- Y1 > T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.7864082 > T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6210372 > T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.5031032 > T<- Y[43] ; Y[43]<- Y[N-42]; Y[N-42]<- T # cor(X,Y) = 0.5001594 > T<- Y[49] ; Y[49]<- Y[N-48]; Y[N-48]<- T # cor(X,Y) = 0.500079 > T<- Y[50] ; Y[50]<- Y[N-49]; Y[N-49]<- T # cor(X,Y) = 0.5000634 > >### And this is probably as close as one can get. Pretty close! > > Now one can plot(X,Y). Maybe the resulting cross-shape (more > or less inevitable from such a procedure) is undesirable. > But at least the desiderata stated in the query have been met. > > Ted. > -------------------------------------------------------------------- set.seed(54321) X0 <- rnorm(100) ; Y0 <- runif(100) r <- 0.5 X1 <- sort(X0) ; Y1 <- sort(Y0) X <- X1 ; Y <- Y1 N <- length(Y) T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.8231312 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6779782 T <- Y[3] ; Y[3] <- Y[N-2] ; Y[N-2] <- T # cor(X,Y) = 0.5470702 T <- Y[4] ; Y[4] <- Y[N-3] ; Y[N-3] <- T # cor(X,Y) = 0.4284003 X <- X1 ; Y <- Y1 T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.8231312 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6779782 T <- Y[3] ; Y[3] <- Y[N-2] ; Y[N-2] <- T # cor(X,Y) = 0.5470702 T <- Y[18]; Y[18]<- Y[N-18]; Y[N-18]<- T # cor(X,Y) = 0.5062198 T <- Y[25]; Y[25]<- Y[N-24]; Y[N-24]<- T # cor(X,Y) = 0.4819134 X <- X1 ; Y <- Y1 T <- Y[1] ; Y[1] <- Y[N-0] ; Y[N-0] <- T # cor(X,Y) = 0.8231312 T <- Y[2] ; Y[2] <- Y[N-1] ; Y[N-1] <- T # cor(X,Y) = 0.6779782 T <- Y[3] ; Y[3] <- Y[N-2] ; Y[N-2] <- T # cor(X,Y) = 0.5470702 T <- Y[18]; Y[18]<- Y[N-18]; Y[N-18]<- T # cor(X,Y) = 0.5062198 T <- Y[40]; Y[40]<- Y[N-39]; Y[N-39]<- T # cor(X,Y) = 0.5018108 T <- Y[45]; Y[45]<- Y[N-44]; Y[N-44]<- T # cor(X,Y) = 0.5007854 T <- Y[46]; Y[46]<- Y[N-45]; Y[N-45]<- T # cor(X,Y) = 0.5001801 T <- Y[49]; Y[49]<- Y[N-48]; Y[N-48]<- T # cor(X,Y) = 0.5001178 T <- Y[50]; Y[50]<- Y[N-49]; Y[N-49]<- T # cor(X,Y) = 0.5001087 Once again, the above shows only a selection of the intermediate results (various local probings omitted). OInce again done by hand in about 20 minutes. Apologies for the mistake. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 21-Feb-11 Time: 22:58:00 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.