On 19-Oct-06 tom soyer wrote: > Hi, > > I looked up the help file on sample(), but didn't find the > info I was looking for. > > When sample() is used to resample from a distribution, e.g., > bootstrap, how does it do it? Does it use an uniform distribution, > e.g., runif(), or something else?
I don't know the details of the algorithm, but since sample() has flexible options it may be helpful to describe the effect of sample() in different cases. 1. sample(x,r) where x is a vector of length n In effect, the index values (1:n) of x are sampled from without replacement (default) with a uniform probability distribution over the available elements at all stages. Hence, i1 is sampled from (1:n) with probability 1/n for each possibility. Then i2 is sampled from the remainder with probability 1/(n-1) for each, and so on until r items (all distinct) have been sampled. If the resulting indices are {i1,i2,...,ir} then the result is x[i1],x[i2],...,x[ir]. Thus, if some of the values in x[1],...,x[n] are equal, you can get 2 or more items in the sample which are equal even though the sampling is done without replacement (since it is the indices which are sampled). [NB I'm describing the *effect* here, not saying that this is how the algorithm operates] 2. sample(x, replace=TRUE) Similar to [1], except that the sampled index is returned to the pool and is available to be sampled again, so at each stage the probability of any value being chosen is 1/n. 3. sample(x, replace=TRUE, prob=p) where p is a vector of probability weights (which must not all be 0, and none negative). First, p is converted into a probability distribution (summing to 1) (in effect by dividing by the sum). Then an index i1 is sampled from (1:n) with probability p[i] that i is chosen. This is repeated (with previously sampled i's still available) until r index values have been sampled -- i1,...,ir. The result is x[i1],...,x[ir]. 4. sample(x, prob=p) [without replacement] First p is scaled to sum to 1, then i1 is sampled as in [3]. The remaining p-values are rescaled so as to sum to 1, and i2 is sampled from the remaining i's; and so on. These are the essential variants of the use of sample(). runif() can be used to sample i1 from (1:n) with equal probabilities by selecting i if runif() is <= i and > (i-1) for i = 1:n. Similarly runif() can be used to sample i1 from (1:n) with probabilities p1,...,pn by selecting i if p[1] + ... + p[i-1] < runif() <= p[1] + ... + p[i] [LHS=0 if i=0], since the probability of this happening is p[i]. > And, when the help file > says:"sample(x) generates a random permutation of the elements > of x (or 1:x)", Since the default value of r (size of sample) is the length of x, say n, sample(x) (see [1] above) will sample n elements without replacement from the n elements of x with uniform probabilities at each stage. In effect, n elements i1,i2,...,in will be sampled without replacement from (1:n), giving a random permutation of (1:n), so the result x[i1],...,x[in] will be a random permutation of x[1],...,x[n] (though different random permutations may look identical if there are equal values in x[1],...,x[n]). > would I be correct if I translate the statement > as follows: it means that the order of sequence, which was > generated from a uniform distribution, would look like a > random normal distribution. No. A normal distribution has nothing to do with it! *Unless* the values x[1],...,x[n] already loooked like values which had already been sampled from a normal distribution (but were, say, in increasing order of size). Then sample(x) would shuffle them into random order so the result could then look like a real sample according ot eh order in which the data came in. Hoping this helps! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 19-Oct-06 Time: 19:34:13 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.