I can't speak to the intricacies of the formula but when I run the ByDataFrame() function provided on a subsample of my data (n=50) it returned only the very first id value in the output; the rest came out as <NA>.... This is not to say it has not properly selected the rows with min(x$diff), but I have no way of verifying without the id membership in the output.
Charilaos Skiadas-3 wrote: > > > On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote: > >> >> aggregate() is indeed a useful function in this case, but it only >> returns the >> columns by which it was grouped. Is there a way I can use this while >> simultaneously retaining all the other column values in the dataframe? >> >> e.g. add superfluous (yet pertinent for later) column containing any >> information at all and retain it in the final output > > I had exactly this kind of need many times, and I have finally > created a function for it, which I hope to include soon in an > upcoming package. Here is a run of it (I added an extra "A" column > containing just the numbers 1:8): > > > DF > id day diff A > 1 1 01-01-09 0.5 1 > 2 1 01-01-09 0.7 2 > 3 2 01-01-09 0.2 3 > 4 2 01-01-09 0.4 4 > 5 1 01-02-09 0.1 5 > 6 1 01-02-09 0.3 6 > 7 2 01-02-09 0.3 7 > 8 2 01-02-09 0.4 8 > > byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),]) > diff A id day > 1 0.5 1 1 01-01-09 > 2 0.2 3 2 01-01-09 > 3 0.1 5 1 01-02-09 > 4 0.3 7 2 01-02-09 > > Would that do what you want? > > I've appended the function byDataFrame, and its prerequisite, a > function parseIndexList. I'm not quite set on the names yet, but > anyway. Hope this helps. I haven't really tested it on large sets, it > might perform poorly. Any suggestions on speeding the code / > corrections are welcome. > > Haris Skiadas > Department of Mathematics and Computer Science > Hanover College > > > > parseIndexList <- function(indexList) { > # browser() > if (!is.list(indexList)) > indexList <- as.list(indexList) > nI <- length(indexList) > namelist <- vector("list", nI) > names(namelist) <- names(indexList) > extent <- integer(nI) > nx <- length(indexList[[1]]) > one <- as.integer(1) > group <- rep.int(one, nx) > ngroup <- one > for (i in seq.int(indexList)) { > index <- as.factor(indexList[[i]]) > if (length(index) != nx) > stop("arguments must have same length") > namelist[[i]] <- sort(unique(indexList[[i]])) > extent[i] <- length(namelist[[i]]) > group <- group + ngroup * (as.integer(index) - one) > ngroup <- ngroup * nlevels(index) > } > nms <- do.call(expand.grid, namelist) > ind <- unique(sort(group)) > res <- data.frame(index=ind, nms[ind, , drop=FALSE]) > return(list(cases=group, groups=res)) > } > > byDataFrame <- function (data, INDEX, FUN, newnames, > omit.index.cols=TRUE, ...) { > # # Part of the code shamelessly stolen from tapply > IND <- eval(substitute(INDEX), data) > nms <- as.character(as.list(substitute(INDEX))) > if (!is.list(IND)) { > IND <- list(IND) > names(IND) <- nms > } else { > names(IND) <- nms[-1] > } > funname <- paste(as.character(substitute(FUN)), collapse=".") > indexInfo <- parseIndexList(IND) > FUNx <- if (omit.index.cols) { > omit.cols <- match(names(indexInfo$groups)[-1], names(data)) > function(x, ...) FUN(data[x, -omit.cols], ...) > } else { > function(x, ...) FUN(data[x, ], ...) > } > ans <- lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...) > index <- as.numeric(names(ans)) > if (!is.data.frame(ans[[1]])) { > ans <- lapply(ans, function(x) { > dframe <- as.data.frame(t(x)) > if (is.null(names(x))) > names(dframe) <- funname > dframe > }) > } > lengths <- sapply(ans, nrow) > ans <- do.call(rbind, ans) > if (!missing(newnames)) > names(ans) <- newnames > nms <- indexInfo$groups[rep(index, lengths),-1, drop=FALSE] > res <- cbind(ans, nms) > res > } > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17836560.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.