Thierry, Thanks so much. Your solution works perfectly.
Tony -----Original Message----- From: ONKELINX, Thierry [mailto:[EMAIL PROTECTED] Sent: Monday, September 15, 2008 2:56 AM To: Brown, Tony Nicholas; r-help@r-project.org Subject: RE: [R] randomly sample within clustered data? Something like this? do.call("rbind", lapply( split(Dataf, Dataf$id), function(x){ x[sample(seq_len(nrow(x)), size=2), ] } ) ) HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Brown, Tony Nicholas Verzonden: maandag 15 september 2008 9:40 Aan: r-help@r-project.org Onderwerp: [R] randomly sample within clustered data? Dear useRs, What is an efficient way to randomly sample from clustered data such that I get equal representation from each cluster? For example, let's say I want to randomly sample two cases from each cluster created by the "id" variable in the following data frame: > id<-c(rep("100", 4),rep("101", 3), rep("102", 6), rep("103", 7)) > sex<-sample(c("m","f"), 20, replace=TRUE) > weight<-rnorm(n=20, mean=150, sd=3) > attitude<-sample(1:7, 20, replace=TRUE) > Dataf<-data.frame(id,sex,weight,attitude) > Dataf id sex weight attitude 1 100 m 146.5064 6 2 100 f 150.2317 4 3 100 f 149.3686 5 4 100 m 144.7218 7 5 101 m 147.9071 4 6 101 m 148.3802 6 7 101 m 154.4634 1 8 102 m 153.2719 5 9 102 m 148.9821 5 10 102 f 148.0656 1 11 102 f 148.8949 6 12 102 m 146.9963 4 13 102 m 153.0542 4 14 103 m 148.1558 1 15 103 f 148.0482 4 16 103 m 151.8044 2 17 103 f 155.4976 4 18 103 m 150.0423 1 19 103 f 146.0487 5 20 103 m 154.6651 7 > Here's the R code I wrote that obviously does not work: sapply(split(Dataf, Dataf$id), sample, size=2) I would prefer a data frame (i.e., Dataf2) as the final output and it should look something like this: > Dataf2 id sex weight attitude 1 100 m 146.5064 6 2 100 m 144.7218 7 3 101 m 147.9071 4 4 101 m 154.4634 1 5 102 m 153.2719 5 6 102 m 148.9821 5 7 103 f 155.4976 4 8 103 f 146.0487 5 > Thanks in advance in your assistance. Tony ------------------------------------------------------------------ Tony N. Brown, Ph.D. Associate Professor of Sociology Faculty Head of Hank Ingram House, The Commons Research Fellow, Vanderbilt Center for Nashville Studies Vanderbilt University (615) 322-7518 (615) 322-7505 fax [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.