Hi: > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher <mosherste...@gmail.com>wrote: > Thanks,
> I was trying to stick with the base package and figure out how the base routines worked. If you want to use base functions, then here's a solution with aggregate: (the Id column was removed first): > with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm = TRUE)) Year D Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN 2 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245 3 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN 4 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN 5 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN 6 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN 7 1988 1.333333 238 246 249 246 244 213 212 224 232 238 232 230 8 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238 The problem with tapply() is that the function has to be called recursively on each column you want to summarize. You could do it in a loop: > res <- matrix(NA, 8, 14) > res[, 1] <- unique(DF$Year) > res[, 2] <- with(DF, tapply(D, Year, mean, na.rm = TRUE)) > for(j in 3:14) res[, j] <- tapply(DF[, j], DF$Year, mean, na.rm = TRUE) > res [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [1,] 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN [2,] 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 [3,] 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN NaN [4,] 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN [5,] 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN [6,] 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 [7,] 1988 1.333333 238 246 249 246 244 213 212 224 232 238 232 [8,] 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN NaN [,14] [1,] NaN [2,] 245 [3,] NaN [4,] NaN [5,] NaN [6,] NaN [7,] 230 [8,] 238 but it's not the most efficient way to do things. Essentially, this approach conforms to the 'split-apply-combine' strategy which is more efficiently implemented in functions like aggregate() or in packages such as doBy, plyr, reshape and data.table, some of which were mentioned earlier by Petr Pikal. HTH, Dennis On Mon, Apr 26, 2010 at 8:01 AM, steven mosher <mosherste...@gmail.com>wrote: > Thanks, > > I was trying to stick with the base package and figure out how the base > routines worked. I looked at plyer and it was very appealing. I guess i'll > give in and use it > > On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy <djmu...@gmail.com> wrote: > >> Hi: >> >> Use of ddply() in the plyr package appears to work. >> >> library(plyr) >> ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE) >> >> D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec >> 1 1.000000 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN >> 2 0.500000 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245 >> 3 0.500000 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN >> 4 0.500000 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN >> 5 0.000000 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN >> 6 1.333333 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN >> 7 1.333333 1988 238 246 249 246 244 213 212 224 232 238 232 230 >> 8 1.333333 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238 >> >> Replace the NaNs with NAs and that should do it.... >> >> HTH, >> Dennis >> >> On Sun, Apr 25, 2010 at 9:52 PM, steven mosher <mosherste...@gmail.com>wrote: >> >>> Having some difficulties with understanding how tapply works and getting >>> return values I expect >>> >>> Data: dataframe. DF DF$Id $D $Year....... >>> >>> Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep >>> Oct >>> Nov Dec >>> 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA >>> NA >>> 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 >>> NA >>> 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA >>> 245 >>> 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA >>> NA >>> 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA >>> NA >>> 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA >>> NA >>> 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA >>> NA >>> 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA >>> NA >>> 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 >>> NA >>> 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 >>> NA >>> 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 >>> NA >>> 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 >>> 230 >>> 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA NA >>> 230 >>> 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA NA >>> 230 >>> 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA NA >>> 238 >>> 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA NA >>> 238 >>> 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA NA >>> 238 >>> >>> and the result should be a dataframe of column means by year with the >>> variable D dropped (or kept doesnt matter) >>> >>> 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA >>> NA >>> 11264402000 .5 1981 NA NA 243 244 NA NA NA NA 225 NA 231 >>> NA >>> 11264402000 .5 1982 236 237 242 240 242 205 199 NA NA NA NA >>> NA >>> 11264402000 .5 1983 NA 247 NA NA NA NA NA 205 NA 225 NA >>> NA >>> 11264402000 1 1986 NA NA NA 240 NA NA NA 213 NA NA NA >>> NA >>> 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243 240 >>> NA >>> 11264402000 1.33 1988 238 246 249 246 244 213 212 224 232 238 >>> 232 >>> 230 >>> 11264402000 1.33 1989 232 233 238 239 231 NA 215 NA NA NA >>> NA >>> 238 >>> >>> It would seem that Tapply should work >>> result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T) >>> >>> but i get errors about the length of arguments, which >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.