Dear useRs,


What is an efficient way to randomly sample from clustered data such
that I get equal representation from each cluster? For example, let's
say I want to randomly sample two cases from each cluster created by the
"id" variable in the following data frame:


> id<-c(rep("100", 4),rep("101", 3), rep("102", 6), rep("103", 7))

> sex<-sample(c("m","f"), 20, replace=TRUE)

> weight<-rnorm(n=20, mean=150, sd=3)

> attitude<-sample(1:7, 20, replace=TRUE)

> Dataf<-data.frame(id,sex,weight,attitude)

> Dataf

    id sex   weight attitude

1  100   m 146.5064        6

2  100   f 150.2317        4

3  100   f 149.3686        5

4  100   m 144.7218        7

5  101   m 147.9071        4

6  101   m 148.3802        6

7  101   m 154.4634        1

8  102   m 153.2719        5

9  102   m 148.9821        5

10 102   f 148.0656        1

11 102   f 148.8949        6

12 102   m 146.9963        4

13 102   m 153.0542        4

14 103   m 148.1558        1

15 103   f 148.0482        4

16 103   m 151.8044        2

17 103   f 155.4976        4

18 103   m 150.0423        1

19 103   f 146.0487        5

20 103   m 154.6651        7



Here's the R code I wrote that obviously does not work:


sapply(split(Dataf, Dataf$id), sample, size=2)


I would prefer a data frame (i.e., Dataf2) as the final output and it
should look something like this:


> Dataf2

    id sex   weight attitude

1  100   m 146.5064        6

2  100   m 144.7218        7

3  101   m 147.9071        4

4  101   m 154.4634        1

5  102   m 153.2719        5

6  102   m 148.9821        5

7  103   f 155.4976        4

8  103   f 146.0487        5



Thanks in advance in your assistance.






Tony N. Brown, Ph.D.

Associate Professor of Sociology

Faculty Head of Hank Ingram House, The Commons

Research Fellow, Vanderbilt Center for Nashville Studies

Vanderbilt University

(615) 322-7518

(615) 322-7505 fax


        [[alternative HTML version deleted]]

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Reply via email to