Lets assume your zcta data looks like this set.seed(12345) ## temporary for reproducibility zcta <- data.frame( zipcode=LETTERS[1:5], prop=runif(5) ) zcta zipcode prop 1 A 0.7209039 2 B 0.8757732 3 C 0.7609823 4 D 0.8861246 5 E 0.4564810
This says that 72.1% of the population in zipcode A is female, ..., and 45.6% in zipcode E is female. Now suppose you sampled 20 people and you recorded the zipcode (and other variables) and stored in 'samp' samp <- data.frame( id=1:20, zipcode=LETTERS[ sample(1:5, 20, replace=TRUE) ]) Now, I am not sure what you want to do. But I could see two possible meanings from your message. 1) If you want to sample 10 observation, with each observation weighted INDEPENDENTLY by the proportion of women in its zipcode, try something like the following. The problem with this option is that it depends on the prevalence of the zipcodes of the observations. comb <- merge( samp, zcta, all.x=T ) comb <- comb[ order(comb$id), ] comb[ sample( comb$id, 10, prob=comb$prop ), ] 2) If you want to sample x% in each zipcode, where x is the proportion of women in that zipcode. Then this is what I would call stratified sampling. Try this: tmp <- split( samp, samp$zipcode ) out <- NULL for( z in names(tmp) ){ df <- tmp[[z]] p <- zcta[ zcta$zipcode == z, "prop" ] out[[z]] <- df[ sample( 1:nrow(df), p*nrow(df) ), ] } do.call("rbind", out) You probably need a variant of these but if you need further help, you will need to provide more information and better yet examples. Regards, Adai Kirsten Beyer wrote: > I am interested in locating a script to implement a sampling scheme > that would basically make it more likely that a particular observation > is chosen based on a weight associated with the observation. I am > trying to select a sample of ~30 census blocks from each ZIP code area > based on the proportion of women in a ZCTA living in a particular > block. I want to make it more likely that a block will be chosen if > the proportion of women in a patient's age group in a particular block > is high. Any ideas are appreciated! > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.