[R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
Dear All,

I cannot find a solution to the following problem although I imagine
that it is a classic, hence my email.

I have a vector V of X values comprised between 1 and N.

I would like to get random samples of X values also comprised between
1 and N, but the important point is:
* I would like to keep the same distribution of distances between the X values *

For example let's say N=10 and I have V = c(3,4,5,6)
then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 etc..
so that the distribution of distances (3 - 4, 3 -5, 3 - 6, 4 -
5, 4 - 6 etc ...) is kept constant.

I couldn't find a package that help me with this, but it looks like it
should be a classic problem so there should be something!

Many thanks in advance for any help or hint you could provide,

All the best,

Emmanuel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
Dear All,(my apologies if it got posted twice, it seems it didn't
get through)

I cannot find a solution to the following problem although I suppose
this is a classic.

I have a vector V of X=length(V) values comprised between 1 and N.

I would like to get random samples of X values also comprised between
1 and N, but the important point is:
* I would like to keep the same distribution of distances between the
original X values *

For example let's say N=10 and I have V = c(3,4,5,6)
then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 etc..
so that the distribution of distances (3 - 4, 3 -5, 3 - 6, 4 -
5, 4 - 6 etc ...) is kept constant.

I couldn't find a package that help me with this, but it looks like it
should be a classic problem so there should be something!

Many thanks in advance for any help or hint you could provide,

All the best,

Emmanuel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Nordlund, Dan (DSHS/RDA)
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Emmanuel Levy
 Sent: Wednesday, August 12, 2009 3:05 PM
 To: r-h...@stat.math.ethz.ch
 Cc: dev djomson
 Subject: [R] Random sampling while keeping distribution of nearest neighbor
 distances constant.
 
 Dear All,
 
 I cannot find a solution to the following problem although I imagine
 that it is a classic, hence my email.
 
 I have a vector V of X values comprised between 1 and N.
 
 I would like to get random samples of X values also comprised between
 1 and N, but the important point is:
 * I would like to keep the same distribution of distances between the X 
 values *
 
 For example let's say N=10 and I have V = c(3,4,5,6)
 then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 
 etc..
 so that the distribution of distances (3 - 4, 3 -5, 3 - 6, 4 -
 5, 4 - 6 etc ...) is kept constant.
 
 I couldn't find a package that help me with this, but it looks like it
 should be a classic problem so there should be something!
 
 Many thanks in advance for any help or hint you could provide,
 
 All the best,
 
 Emmanuel
 

Emmanuel,

I don't know if this is a classic problem or not.  But given your description, 
you write your own function something like this

sample.dist - function(vec, Min=1, Max=10){
  diffs - c(0,diff(vec))
  sum_d - sum(diffs)
  sample(Min:(Max-sum_d),1)+cumsum(diffs)
  }

Where Min and Max are the minimum and maximum values that you are sampling from 
(Min=1 and Max=10 in your example), and vec is passed the vector that you are 
sampling distances from.  This assumes that your vector is sorted smallest to 
largest as in your example.   The function could be changed to accommodate a 
vector that isn't sorted.

 V - sort(sample(1:100,4))
 V
#[1] 46 78 82 95
 sample.dist(V, Min=1, Max=100)
#[1] 36 68 72 85
 sample.dist(V, Min=1, Max=100)
#[1] 12 44 48 61

This should get you started at least.  Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
Dear Daniel,

Thank a lot for your suggestion. It is helpful and got me thinking
more about it so that I can rephrase it:

Given a vector V containing X values, comprised within 1 and N. I'd
like to sample values so that the *distribution* of distances between
the X values is similar.

There are several distributions: the 1st order would be given by the
function diff.
The 2d order distribution would be given by
diff(V[seq(1,length(V),by=2)]) and diff(V[seq(2,length(V),by=2)])
The 3rd order distribution diff(V[seq(1,length(V),by=3)]) and
diff(V[seq(2,length(V),by=3)]) and diff(V[seq(3,length(V),by=3)])
The 4th order 

I would like to produce different samples, where the first, or first
and second, or first and second and third, or up to say five orders
distance distributions are reproduced.

Is anybody aware of a formalism that is explained in a book and that
could help me deal with this problem? Or even better of a package?

Thanks for your help,

Emmanuel




2009/8/12 Nordlund, Dan (DSHS/RDA) nord...@dshs.wa.gov:
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Emmanuel Levy
 Sent: Wednesday, August 12, 2009 3:05 PM
 To: r-h...@stat.math.ethz.ch
 Cc: dev djomson
 Subject: [R] Random sampling while keeping distribution of nearest neighbor
 distances constant.

 Dear All,

 I cannot find a solution to the following problem although I imagine
 that it is a classic, hence my email.

 I have a vector V of X values comprised between 1 and N.

 I would like to get random samples of X values also comprised between
 1 and N, but the important point is:
 * I would like to keep the same distribution of distances between the X 
 values *

 For example let's say N=10 and I have V = c(3,4,5,6)
 then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 
 etc..
 so that the distribution of distances (3 - 4, 3 -5, 3 - 6, 4 -
 5, 4 - 6 etc ...) is kept constant.

 I couldn't find a package that help me with this, but it looks like it
 should be a classic problem so there should be something!

 Many thanks in advance for any help or hint you could provide,

 All the best,

 Emmanuel


 Emmanuel,

 I don't know if this is a classic problem or not.  But given your 
 description, you write your own function something like this

 sample.dist - function(vec, Min=1, Max=10){
  diffs - c(0,diff(vec))
  sum_d - sum(diffs)
  sample(Min:(Max-sum_d),1)+cumsum(diffs)
  }

 Where Min and Max are the minimum and maximum values that you are sampling 
 from (Min=1 and Max=10 in your example), and vec is passed the vector that 
 you are sampling distances from.  This assumes that your vector is sorted 
 smallest to largest as in your example.   The function could be changed to 
 accommodate a vector that isn't sorted.

 V - sort(sample(1:100,4))
 V
 #[1] 46 78 82 95
 sample.dist(V, Min=1, Max=100)
 #[1] 36 68 72 85
 sample.dist(V, Min=1, Max=100)
 #[1] 12 44 48 61

 This should get you started at least.  Hope this is helpful,

 Dan

 Daniel J. Nordlund
 Washington State Department of Social and Health Services
 Planning, Performance, and Accountability
 Research and Data Analysis Division
 Olympia, WA  98504-5204


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Nordlund, Dan (DSHS/RDA)
 -Original Message-
 From: Emmanuel Levy [mailto:emmanuel.l...@gmail.com]
 Sent: Wednesday, August 12, 2009 4:48 PM
 To: Nordlund, Dan (DSHS/RDA)
 Cc: r-h...@stat.math.ethz.ch; dev djomson
 Subject: Re: [R] Random sampling while keeping distribution of nearest 
 neighbor
 distances constant.
 
 Dear Daniel,
 
 Thank a lot for your suggestion. It is helpful and got me thinking
 more about it so that I can rephrase it:
 
 Given a vector V containing X values, comprised within 1 and N. I'd
 like to sample values so that the *distribution* of distances between
 the X values is similar.
 
 There are several distributions: the 1st order would be given by the
 function diff.
 The 2d order distribution would be given by
 diff(V[seq(1,length(V),by=2)]) and diff(V[seq(2,length(V),by=2)])
 The 3rd order distribution diff(V[seq(1,length(V),by=3)]) and
 diff(V[seq(2,length(V),by=3)]) and diff(V[seq(3,length(V),by=3)])
 The 4th order 
 
 I would like to produce different samples, where the first, or first
 and second, or first and second and third, or up to say five orders
 distance distributions are reproduced.
 
 Is anybody aware of a formalism that is explained in a book and that
 could help me deal with this problem? Or even better of a package?
 
 Thanks for your help,
 
 Emmanuel
 
 

But if the 1st order differences are the same, then doesn't it follow that the 
2nd, 3rd, ... order differences must be the same between the original and the 
new random vector.  What am I missing?

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
 But if the 1st order differences are the same, then doesn't it follow that 
 the 2nd, 3rd, ... order differences must be the same between the original and 
 the new random vector.  What am I missing?

You are missing nothing sorry, I wrote something wrong. What I would
like to be preserved is the distance with the *nearest* neighbor, so
diff is not the way to go. If you only consider the nearest neighbor,
then

c(3,4, 8,9) and c(4,5,6,7) are the same in terms of first order (all
closest neighbor are 1 unit away) but not in terms of second order.

Also, I don't know if there would be a simple way to maintain a
*distribution* of distances (even if not of nearest neighbor).
For example, c(2,4,5,6) could be c(1,3,4,5), c(3,5,6,7) as proposed by
your solution, but it could also be: c(4,5,6,8)
Or, c(2,3,6,7,8) could be c(2,3,4,7,8)

Actually, that's really simple! I can simply resample the diff vector!

OK so the only problem becomes the 1st, 2d, 3rd order thing now, but
you made me realize that I can skip it for the moment.

Thank you! :-)

Emmanuel




 Dan

 Daniel J. Nordlund
 Washington State Department of Social and Health Services
 Planning, Performance, and Accountability
 Research and Data Analysis Division
 Olympia, WA  98504-5204


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.