[R] randomly sample within clustered data?

Brown, Tony Nicholas Mon, 15 Sep 2008 00:42:38 -0700

Dear useRs,


What is an efficient way to randomly sample from clustered data such
that I get equal representation from each cluster? For example, let's
say I want to randomly sample two cases from each cluster created by the
"id" variable in the following data frame:

 

> id<-c(rep("100", 4),rep("101", 3), rep("102", 6), rep("103", 7))

> sex<-sample(c("m","f"), 20, replace=TRUE)

> weight<-rnorm(n=20, mean=150, sd=3)

> attitude<-sample(1:7, 20, replace=TRUE)

> Dataf<-data.frame(id,sex,weight,attitude)

> Dataf

    id sex   weight attitude

1  100   m 146.5064        6

2  100   f 150.2317        4

3  100   f 149.3686        5

4  100   m 144.7218        7

5  101   m 147.9071        4

6  101   m 148.3802        6

7  101   m 154.4634        1

8  102   m 153.2719        5

9  102   m 148.9821        5

10 102   f 148.0656        1

11 102   f 148.8949        6

12 102   m 146.9963        4

13 102   m 153.0542        4

14 103   m 148.1558        1

15 103   f 148.0482        4

16 103   m 151.8044        2

17 103   f 155.4976        4

18 103   m 150.0423        1

19 103   f 146.0487        5

20 103   m 154.6651        7

> 

 

Here's the R code I wrote that obviously does not work:

 

sapply(split(Dataf, Dataf$id), sample, size=2)

 

I would prefer a data frame (i.e., Dataf2) as the final output and it
should look something like this:

 

> Dataf2

    id sex   weight attitude

1  100   m 146.5064        6

2  100   m 144.7218        7

3  101   m 147.9071        4

4  101   m 154.4634        1

5  102   m 153.2719        5

6  102   m 148.9821        5

7  103   f 155.4976        4

8  103   f 146.0487        5

> 

 

Thanks in advance in your assistance.

 

Tony

 

 

------------------------------------------------------------------



Tony N. Brown, Ph.D.

Associate Professor of Sociology

Faculty Head of Hank Ingram House, The Commons

Research Fellow, Vanderbilt Center for Nashville Studies

Vanderbilt University

(615) 322-7518

(615) 322-7505 fax

[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> 




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] randomly sample within clustered data?

Reply via email to