Re: [R] Generating uniformly distributed correlated data.

David Winsemius Mon, 21 Feb 2011 09:00:13 -0800


On Feb 21, 2011, at 9:53 AM, Erich Neuwirth wrote:

We want to generate a distribution on the unit square with thefollowing

properties
* It is concentrated on a "reasonable" subset of the square,
 and the restricted distribution is uniform on this subset.
* Both marginal distributions are uniform on the unit interval.
* All horizontal and all vertical cross sections are sets of lines
 segments with the same total length

If we find a geometric figure with these properties, we have solvedthe

problem.

So we define the distribution to be uniform on the following area:
(it is distorted but should give the idea)

x***/-----------------/***x
|**/-----------------/****|
|*/-----------------/*****|
|/-----------------/******|
|-----------------/******/|
|----------------/******/-|
|---------------/******/--|
|--------------/******/---|
|-------------/******/----|
|------------/******/-----|
|-----------/******/------|
|----------/******/-------|
|---------/******/--------|
|--------/******/---------|
|-------/******/----------|
|------/******/-----------|
|-----/******/------------|
|----/******/-------------|
|---/******/--------------|
|--/******/---------------|
|-/******/----------------|
|/******/-----------------|
|******/-----------------/|
|*****/-----------------/*|
|****/-----------------/**|
x***/-----------------/***x

There is the same number of stars in each horizontal row and each
vertical column.

I love it! I plotted the data that your method generated and got aplot with points in the regions you displayed. After pondering itscurious pathology (three disjoint regions), I thought I could curethat pathology by flipping quadrants. I proceeded to do so butdiscovered that the pathology wasn't really cured but onlyconcentrated at the ends and middle. Using your ASCII art, my versionwuld look like this where the \\\ regions are "double dense".


y[x>0.5 & y <0.5] = 0.5 +abs(0.5-y[x>0.5 & y <0.5])
y[x<0.5 & y >0.5] = 0.5 -abs(0.5-y[x<0.5 & y >0.5])
plot(x,y)

x---------------------\\\\x
|--------------------/*\\\|
|-------------------/***\\|
|------------------/*****\|
|-----------------/******/|
|----------------/******/-|
|---------------/******/--|
|--------------/******/---|
|-------------/******/----|
|-------------|\\***/-----|
|-------------|\\\*/------|
|-------------|\\\\-------|
|-------------*-----------|
|--------/\\\\------------|
|-------/**\\\------------|
|------/****\\------------|
|-----/******\------------|
|----/******/-------------|
|---/******/--------------|
|--/******/---------------|
|-/******/----------------|
|/******/-----------------|
|\*****/------------------|
|\\***/-------------------|
|\\\\/--------------------|
\\\\\---------------------x

So it not only created a still slightly less pathological counterpart,but the correlation jumped from 0.5 to 0.95. It looks to be apromising basis for homework problems in probability courses, anexperience I have never has the ?pleasure? to experience except duringself-study to repair my (many) mathematical deficiencies.


> cor(x,y)
[1] 0.9449256

--
David.



So we define
g(x1,x2)= 1 abs(x1-x2) <= a or
           abs(x1-x2+1) <= a or
           abs(x1-x2-1) <= a
         0 elsewhere

The total area of the shape is 2*a.
The admissible range for a is <0,1/2>
therefore
f(x1,x2)=g(x1,x2)/(2*a)
is a density functions.
This is where simple algebra comes in.
This distribution has
expected value 1/2 and variance 1/12 for both margins
(uniform distribution), and it has
covariance = (1-3*a+2*a2)/12
and correlation = 1 - 3*a + 2*a2

The inverse function of 1 - 3*2 + 2*a2 is
(3-sqrt(1+8*r))/4

Therefore we can compute that our distribution with
a=(3-sqrt(1+8*r))/4
will produce a given r.


Ho do we create random numbers from this distribution?
By using conditional densities.
x1 is sampled from the uniform distribution, and for a give x1

we produce x2 by a uniform distribution on the along the verticalcross

cut of the geometrical shape (which is either 1 or 2 intervals).
And which is most easily implemented by using the modulo operator %%.

This mechanism is NOT a convolution. Applying module after theaddition

makes it a nonconvolution. Adding independent random variables

without doing anything further is a convolution, by applying atrimming

operation, the convolution property gets lost.



--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating uniformly distributed correlated data.

Reply via email to