Re: [R] Subset by Factor by date

Charilaos Skiadas Sat, 14 Jun 2008 05:11:55 -0700

On Jun 14, 2008, at 2:59 AM, T.D.Rudolph wrote:

I can't speak to the intricacies of the formula but when I run the
ByDataFrame() function provided on a subsample of my data (n=50) itreturnedonly the very first id value in the output; the rest came out as<NA>....This is not to say it has not properly selected the rows with min(x$diff),but I have no way of verifying without the id membership in theoutput.

And equally we can't help you with that without a reproducibleexample. Doesn't it do the right thing in the little sample I posted?It moves the id and day columns to the end. Without that, the onlything I can think of that might cause trouble is that you have amatrix instead of a data.frame, or otherwise the columns have someclass I have not anticipated. Perhaps you can send me a part of yourdata off-list, if you can't post it here?


Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

Charilaos Skiadas-3 wrote:



On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote:

aggregate() is indeed a useful function in this case, but it only
returns the
columns by which it was grouped. Is there a way I can use thiswhilesimultaneously retaining all the other column values in thedataframe?
e.g. add superfluous (yet pertinent for later) column containing any
information at all and retain it in the final output


I had exactly this kind of need many times, and I have finally
created a function for it, which I hope to include soon in an
upcoming package. Here is a run of it (I added an extra "A" column
containing just the numbers 1:8):

DF

   id      day diff A
1  1 01-01-09  0.5 1
2  1 01-01-09  0.7 2
3  2 01-01-09  0.2 3
4  2 01-01-09  0.4 4
5  1 01-02-09  0.1 5
6  1 01-02-09  0.3 6
7  2 01-02-09  0.3 7
8  2 01-02-09  0.4 8

byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),])

   diff A id      day
1  0.5 1  1 01-01-09
2  0.2 3  2 01-01-09
3  0.1 5  1 01-02-09
4  0.3 7  2 01-02-09

Would that do what you want?

I've appended the function byDataFrame, and its prerequisite, a
function parseIndexList. I'm not quite set on the names yet, but
anyway. Hope this helps. I haven't really tested it on large sets, it
might perform poorly. Any suggestions on speeding the code /
corrections are welcome.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College



parseIndexList <- function(indexList) {
   # browser()
   if (!is.list(indexList))
     indexList <- as.list(indexList)
   nI <- length(indexList)
   namelist <- vector("list", nI)
   names(namelist) <- names(indexList)
   extent <- integer(nI)
   nx <- length(indexList[[1]])
   one <- as.integer(1)
   group <- rep.int(one, nx)
   ngroup <- one
   for (i in seq.int(indexList)) {
       index <- as.factor(indexList[[i]])
       if (length(index) != nx)
           stop("arguments must have same length")
       namelist[[i]] <- sort(unique(indexList[[i]]))
       extent[i] <- length(namelist[[i]])
       group <- group + ngroup * (as.integer(index) - one)
       ngroup <- ngroup * nlevels(index)
   }
   nms <- do.call(expand.grid, namelist)
   ind <- unique(sort(group))
   res <- data.frame(index=ind, nms[ind, , drop=FALSE])
   return(list(cases=group, groups=res))
}

byDataFrame <- function (data, INDEX, FUN, newnames,
omit.index.cols=TRUE, ...) {
# # Part of the code shamelessly stolen from tapply
   IND <- eval(substitute(INDEX), data)
   nms <- as.character(as.list(substitute(INDEX)))
   if (!is.list(IND)) {
     IND <- list(IND)
     names(IND) <- nms
   } else {
     names(IND) <- nms[-1]
   }
   funname <- paste(as.character(substitute(FUN)), collapse=".")
   indexInfo <- parseIndexList(IND)
   FUNx <- if (omit.index.cols) {
     omit.cols <- match(names(indexInfo$groups)[-1], names(data))
     function(x, ...) FUN(data[x, -omit.cols], ...)
   } else {
     function(x, ...) FUN(data[x, ], ...)
   }
   ans <- lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...)
   index <- as.numeric(names(ans))
   if (!is.data.frame(ans[[1]])) {
     ans <- lapply(ans, function(x) {
       dframe <- as.data.frame(t(x))
       if (is.null(names(x)))
         names(dframe) <- funname
       dframe
     })
   }
   lengths <- sapply(ans, nrow)
   ans <- do.call(rbind, ans)
   if (!missing(newnames))
     names(ans) <- newnames
   nms <- indexInfo$groups[rep(index, lengths),-1, drop=FALSE]
   res <- cbind(ans, nms)
   res
}


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset by Factor by date

Reply via email to