Re: [R] generate 3 distinct random samples without replacement

Cesar Hincapié Mon, 07 Mar 2011 15:18:49 -0800

Thank you all for your helpful comments and suggestions.

Both proper indexing and subsetting a random sample of 300 work well.

Best wishes,

Cesar

On 2011-03-07, at 5:31 PM, <rex.dw...@syngenta.com> <rex.dw...@syngenta.com> 
wrote:

Cesar, I think your basic misconception is that you believe 'sample' returns a 
list of indices into the original vector.  It does not; it returns actual 
elements of the vector:

> sample(runif(100),3)
[1] 0.4492988 0.0336069 0.6948440

I'm not sure why you keep resetting the seed, but if it's important, replace
d2<-d1[-i]
with
d2<- setdiff(d1,i)

Otherwise Duncan's suggestion is must nicer:
s = sample(d1,300,replace=FALSE)
s1 = sort(s[1:100])
s2 = sort(s[101:200])
s3 = sort(s[201:300])
If what you actually need are indices into the original vector, replace d1 with 
length(d1).

(When you say 'distinct', I'm assuming you mean 'disjoint'.)

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Duncan Murdoch
Sent: Monday, March 07, 2011 3:52 PM
To: Cesar Hincapié
Cc: r-help@r-project.org
Subject: Re: [R] generate 3 distinct random samples without replacement

On 07/03/2011 2:17 PM, Cesar Hincapié wrote:
> Hello:
> 
> I wonder if I could get a little help with random sampling in R.
> 
> I have a vector of length 7375.  I would like to draw 3 distinct random 
> samples, each of length 100 without replacement.  I have tried the following:
> 
> d1<- 1:7375
> 
> set.seed(7)
> i<- sample(d1, 100, replace=F)
> s1<- sort(d1[i])
> s1
> 
> d2<- d1[-i]
> set.seed(77)
> j<- sample(d2, 100, replace=F)
> s2<- sort(d2[j])
> s2
> 
> d3<- d2[-j]
> set.seed(777)
> k<- sample(d3, 100, replace=F)
> s3<- sort(d3[k])
> s3
> 
> D<- data.frame(a=s1,b=s2,c=s3)
> 
> 
> However, s2 is only 97 elements long, and s3, only 96 long.
> 
> I would appreciate any suggestions on a better approach.
> I'm also curious to know why my second and third samples are less than 100 
> elements in length.

If you want 3 non-overlapping, non-repeating samples of 100, why not
draw one sample of 300, and take 3 subsets of it?

The reason you were finding shorter samples is because you were using j
and k as indices into vectors d2 and d3 that didn't have enough
elements, and then you sorted the result, losing the NAs.  For example,

d2 <- 1:10
d2[10:12]
sort(d2[10:12])

See ?sort for an explanation of how to keep NA values when you sort.

Duncan Murdoch

> Thanks for your time and consideration,
> 
> Cesar A. Hincapié, DC, MHSc
> 
> Research Fellow, Division of Health Care and Outcomes Research, Toronto 
> Western Research Institute
> PhD Candidate in Epidemiology, Dalla Lana School of Public Health, University 
> of Toronto
> e. cesar.hinca...@utoronto.ca
> 
> 
> 
> 
> 
>      [[alternative HTML version deleted]]
> 
> 
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generate 3 distinct random samples without replacement

Reply via email to