Re: [R] Tapply.

steven mosher Tue, 27 Apr 2010 01:07:25 -0700

Thanks dennis.

    Is there a book on R u could recommend.




On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
>
> > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher 
> > <mosherste...@gmail.com>wrote:
> > Thanks,
>
> >  I was trying to stick with the base package and figure out how the base
> routines worked.
>
> If you want to use base functions, then here's a solution with aggregate:
> (the Id column
> was removed first):
>
> > with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm =
> TRUE))
>   Year        D Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> 1 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
> 2 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
> 3 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
> 4 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
> 5 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
> 6 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
> 7 1988 1.333333 238 246 249 246 244 213 212 224 232 238 232 230
> 8 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
>
> The problem with tapply() is that the function has to be called recursively
> on each
> column you want to summarize. You could do it in a loop:
> > res <- matrix(NA, 8, 14)
> > res[, 1] <- unique(DF$Year)
> > res[, 2] <- with(DF, tapply(D, Year, mean, na.rm = TRUE))
> > for(j in 3:14) res[, j] <- tapply(DF[, j], DF$Year, mean, na.rm = TRUE)
> > res
>      [,1]     [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
> [,13]
> [1,] 1980 1.000000  NaN  NaN  NaN  NaN  NaN  212  203   209   228   237
> NaN
> [2,] 1981 0.500000  NaN  251  243  246  241  NaN  NaN   NaN   230   NaN
> 231
> [3,] 1982 0.500000  236  237  242  240  242  205  199   NaN   NaN   NaN
> NaN
> [4,] 1983 0.500000  NaN  247  NaN  NaN  NaN  NaN  NaN   205   NaN   225
> NaN
> [5,] 1986 0.000000  NaN  NaN  NaN  240  NaN  NaN  NaN   213   NaN   NaN
> NaN
> [6,] 1987 1.333333  241  NaN  NaN  NaN  NaN  218  NaN   NaN   235   243
> 240
> [7,] 1988 1.333333  238  246  249  246  244  213  212   224   232   238
> 232
> [8,] 1989 1.333333  232  233  238  239  231  NaN  215   NaN   NaN   NaN
> NaN
>      [,14]
> [1,]   NaN
> [2,]   245
> [3,]   NaN
> [4,]   NaN
> [5,]   NaN
> [6,]   NaN
> [7,]   230
> [8,]   238
>
> but it's not the most efficient way to do things.
>
> Essentially, this approach conforms to the 'split-apply-combine' strategy
> which is
> more efficiently implemented in functions like aggregate() or in packages
> such
> as doBy, plyr, reshape and data.table, some of which were mentioned earlier
> by
> Petr Pikal.
>
> HTH,
> Dennis
>
>
> On Mon, Apr 26, 2010 at 8:01 AM, steven mosher <mosherste...@gmail.com>wrote:
>
>> Thanks,
>>
>>   I was trying to stick with the base package and figure out how the base
>> routines worked. I looked at plyer and it was very appealing. I guess i'll
>> give in and use it
>>
>> On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy <djmu...@gmail.com> wrote:
>>
>>> Hi:
>>>
>>> Use of ddply() in the plyr package appears to work.
>>>
>>> library(plyr)
>>> ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)
>>>
>>>          D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>>> 1 1.000000 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
>>> 2 0.500000 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
>>> 3 0.500000 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
>>> 4 0.500000 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
>>> 5 0.000000 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
>>> 6 1.333333 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
>>> 7 1.333333 1988 238 246 249 246 244 213 212 224 232 238 232 230
>>> 8 1.333333 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
>>>
>>> Replace the NaNs with NAs and that should do it....
>>>
>>> HTH,
>>> Dennis
>>>
>>> On Sun, Apr 25, 2010 at 9:52 PM, steven mosher 
>>> <mosherste...@gmail.com>wrote:
>>>
>>>> Having some difficulties with understanding how tapply works and getting
>>>> return values I expect
>>>>
>>>> Data: dataframe. DF  DF$Id $D $Year.......
>>>>
>>>>  Id                          D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
>>>> Oct
>>>> Nov Dec
>>>>  11264402000         1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
>>>>  NA
>>>>  11264402000         0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
>>>>  NA
>>>>  11264402000         1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
>>>> 245
>>>>  11264402000         0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
>>>>  NA
>>>>  11264402000         1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
>>>>  NA
>>>>  11264402000         0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
>>>>  NA
>>>>  11264402000         1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
>>>>  NA
>>>>  11264402000         0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
>>>>  NA
>>>>  11264402000         0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
>>>>  NA
>>>>  11264402000         1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
>>>>  NA
>>>>  11264402000         3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
>>>>  NA
>>>>  11264402000         0 1988 238 246 249  NA 244 213 212 224 232 238 232
>>>> 230
>>>>  11264402000         1 1988 238 246 249 246 244 213 212 224 232  NA  NA
>>>> 230
>>>>  11264402000         3 1988 238 246 249 246 244 213 212 224 232  NA  NA
>>>> 230
>>>>  11264402000         0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
>>>> 238
>>>>  11264402000         1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
>>>> 238
>>>>  11264402000         3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
>>>> 238
>>>>
>>>> and the result should be a dataframe of column means by year  with the
>>>> variable D dropped (or kept doesnt matter)
>>>>
>>>> 11264402000         1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
>>>>  NA
>>>>  11264402000        .5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
>>>>  NA
>>>>  11264402000        .5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
>>>>  NA
>>>>  11264402000        .5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225
>>>>  NA
>>>>  NA
>>>>  11264402000        1  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
>>>>  NA
>>>>  11264402000         2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
>>>>  NA
>>>>  11264402000        1.33 1988 238 246 249  246 244 213 212 224 232 238
>>>> 232
>>>> 230
>>>>  11264402000        1.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
>>>>  NA
>>>> 238
>>>>
>>>>  It would seem that Tapply should work
>>>>  result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)
>>>>
>>>>  but i get errors about the length of arguments, which
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tapply.

Reply via email to