Since you indicated there are six more columns in the data.frame, getSample modified below to take care of it.
> getSample function(x) { sites <- unique(x$SiteID) years <- unique(x$Year) result <- data.frame() x$ID <- seq(1,nrow(x)) for (i in 1:length(sites)) { for (j in 1:length(years)) { if (nrow(x[as.character(x$SiteID)==as.character(sites[i]) & x$Year==years[j],]) > 3) { sampledID <- sample(x[as.character(x$SiteID)==as.character(sites[i]) & x$Year==years[j],]$ID,3,replace=FALSE) for (k in 1:length(sampledID)) { result <- rbind(result,x[x$ID==sampledID[k],-ncol(x)]) } } } } names(result) <- names(x)[-ncol(x)] rownames(result) <- NULL return(result) } > getSample(fitting.set) IDbyYear SiteID Year 1 42.24 A-Airport 2006 2 42.24 A-Airport 2006 3 42.24 A-Airport 2006 -- View this message in context: http://r.789695.n4.nabble.com/sampling-dataframe-based-upon-number-of-record-occurrences-tp4704144p4704155.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.