Ed L Cashin <[EMAIL PROTECTED]> writes: > Ed L Cashin <[EMAIL PROTECTED]> writes: > >> Hi. I am having trouble thinking of an easy way to grab rows out of a >> data frame. I want to select the rows with a median value when the >> rows are similar. > > I'm still catching up on my R list reading, and I notice there is a > similar post to mine: > > Federico Calboli > data manipulation: getting mean value every 5 rows > > I think the responses there answer my question, but I'll have to look > into it. The responses say to use aggregate and an auxiliary row.
After consulting the docs and Venables and Ripley, I am not sure aggregate can do what I'm looking for. Given rows where certain specified columns have the same values, I'd like to select the row with the median value in another specified column ("runtime"). That is, after grouping the rows of the data frame based on the columns in the by parameter, I want to select one whole row "as is", the row with the median "runtime" value, without doing median on more than one column. 'aggregate.data.frame' is the data frame method. If 'x' is not a data frame, it is coerced to one. Then, each of the variables (columns) in 'x' is split into subsets of cases (rows) of identical combinations of the components of 'by', and 'FUN' is applied to each such subset with further arguments in '...' passed to it. (I.e., 'tapply(VAR, by, FUN, ..., simplify = FALSE)' is done for each variable 'VAR' in 'x', conveniently wrapped into one call to 'lapply()'.) Is there a way to tell aggregate just do perform median on column runtime to select the whole row? Empty subsets are removed, and the result is reformatted into a data frame containing the variables in 'by' and 'x'. The ones arising from 'by' contain the unique combinations of grouping values used for determining the subsets, and the ones arising from 'x' the corresponding summary statistics for the subset of the respective variables in 'x'. I'd like to select all the columns, not just the ones I'm using to group or the one from which I want to find the median value. I think I can't use aggregate after all. But I must admit I'm very tired and should go to bed. -- --Ed L Cashin | PGP public key: [EMAIL PROTECTED] | http://noserose.net/e/pgp/ ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html