Andrew Robinson wrote: > These are important concerns. It seems to me that adding an argument > as suggested by Bill will allow the user to side-step the problem > identified by Brian. > > Bill, under what kinds of circumstances would you anticipate a > significant time penalty? I would be happy to check those out with > some simulations. > > If the timing seems acceptable, I can write a patch for tapply.R and > tapply.Rd if anyone in the core is willing to consider them. Please > contact me on or off list if so. > >
There's another concern: tapply (et al.) has the ... args passed on to FUN which means that you have to be really careful with argument names. Could I just interject that we already have > airquality$Month <- factor(airquality$Month,levels=4:9) # April not there > unlist(lapply( + split(airquality$Ozone, airquality$Month, drop=F),sum, na.rm=T)) 4 5 6 7 8 9 0 614 265 1537 1559 912 (splitting on multiple factors gets a bit involved, though) > Best wishes to all, > > Andrew > > > > > On Tue, Nov 06, 2007 at 07:23:56AM +0000, Prof Brian Ripley wrote: > >> On Tue, 6 Nov 2007, [EMAIL PROTECTED] wrote: >> >> >>> Unfortunately I think it would break too much existing code. tapply() >>> is an old function and many people have gotten used to the way it works >>> now. >>> >> It is also not necessarily desirable: FUN(numeric(0)) might be an error. >> For example: >> >> >>> Z <- data.frame(x=rnorm(10), f=rep(c("a", "b"), each=5))[1:5, ] >>> tapply(Z$x, Z$f, sd) >>> >> but sd(numeric(0)) is an error. (Similar things involving var are 'in the >> wild' and so would be broken.) >> >> >>> This is not to suggest there could not be another argument added at the >>> end to indicate that you want the new behaviour, though. e.g. >>> >>> tapply <- function (X, INDEX, FUN=NULL, ..., simplify=TRUE, >>> handle.empty.levels = FALSE) >>> >>> but this raises the question of what sort of time penalty the >>> modification might entail. Probably not much for most situations, I >>> suppose. (I know this argument name looks long, but you do need a >>> fairly specific argument name, or it will start to impinge on the ... >>> argument.) >>> >>> Just some thoughts. >>> >>> Bill Venables. >>> >>> Bill Venables >>> CSIRO Laboratories >>> PO Box 120, Cleveland, 4163 >>> AUSTRALIA >>> Office Phone (email preferred): +61 7 3826 7251 >>> Fax (if absolutely necessary): +61 7 3826 7304 >>> Mobile: +61 4 8819 4402 >>> Home Phone: +61 7 3286 7700 >>> mailto:[EMAIL PROTECTED] >>> http://www.cmis.csiro.au/bill.venables/ >>> >>> -----Original Message----- >>> From: [EMAIL PROTECTED] >>> [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Robinson >>> Sent: Tuesday, 6 November 2007 3:10 PM >>> To: R-Devel >>> Subject: [Rd] A suggestion for an amendment to tapply >>> >>> Dear R-developers, >>> >>> when tapply() is invoked on factors that have empty levels, it returns >>> NA. This behaviour is in accord with the tapply documentation, and is >>> reasonable in many cases. However, when FUN is sum, it would also >>> seem reasonable to return 0 instead of NA, because "the sum of an >>> empty set is zero, by definition." >>> >>> I'd like to raise a discussion of the possibility of an amendment to >>> tapply. >>> >>> The attached patch changes the function so that it checks if there are >>> any empty levels, and if there are, replaces the corresponding NA >>> values with the result of applying FUN to the empty set. Eg in the >>> case of sum, it replaces the NA with 0, whereas with mean, it replaces >>> the NA with NA, and issues a warning. >>> >>> This change has the following advantage: tapply and sum work better >>> together. Arguably, tapply and any other function that has a non-NA >>> response to the empty set will also work better together. >>> Furthermore, tapply shows a warning if FUN would normally show a >>> warning upon being evaluated on an empty set. That deviates from >>> current behaviour, which might be bad, but also provides information >>> that might be useful to the user, so that would be good. >>> >>> The attached script provides the new function in full, and >>> demonstrates its application in some simple test cases. >>> >>> Best wishes, >>> >>> Andrew >>> >>> >> -- >> Brian D. Ripley, [EMAIL PROTECTED] >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >> University of Oxford, Tel: +44 1865 272861 (self) >> 1 South Parks Road, +44 1865 272866 (PA) >> Oxford OX1 3TG, UK Fax: +44 1865 272595 >> > > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel