Yes. Read the help pages **carefully**! e.g. ?tapply says that the first argument is an **atomic** vector. A factor is not an atomic vector. So tapply interprets it as such by looking only at its representation, which is as integer values.
apply works on **arrays,** which must be of a single type. So it silently converts the data frame to the simplest common type it "can," which is an array of characters. etc. I admit that these details are somewhat obscure and even annoying -- but they **are** documented. I think that's all we can expect. Some have lamented the lack of the language's perfect consistency in these matters, but I cannot understand how that would be possible given its nature, intended, as it is, to be **easily** used for high level data manipulation, graphics,statistical analysis etc. as well as programming. There are just too many possible data structures to expect logical consistency in their handling throughout (if one can even define what that means in specific instances!). All these little inconveniences can be worked around easily, of course. For example, if your new vector of numeric factor levels if f.new and f.old is your original factor, levels(f.old)[f.new] converts f.new to the appropriate character vector. And so forth. So the key is: pay **careful** attention to the docs. -- Bert Gunter -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrizio Frederic Sent: Wednesday, December 10, 2008 2:09 PM To: r-help@r-project.org Subject: [R] repeated searching of no-missing values hi all, I have a data frame such as: 1 blue 0.3 1 NA 0.4 1 red NA 2 blue NA 2 green NA 2 blue NA 3 red 0.5 3 blue NA 3 NA 1.1 I wish to find the last non-missing value in every 3ple: ie I want a 3 by 3 data.frame such as: 1 red 0.4 2 blue NA 3 blue 1.1 I have written a little script data = structure(list(V1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L ), V2 = structure(c(1L, NA, 3L, 1L, 2L, 1L, 3L, 1L, NA), .Label = c("blue", "green", "red"), class = "factor"), V3 = c(0.3, 0.4, NA, NA, NA, NA, 0.5, NA, 1.1)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -9L)) cl = function(x) x[max(which(!is.na(x)))] choose.last = function(x) tapply(x,x[,1],cl) # now function choose.last works properly on numeric vectors: > choose.last(data[,3]) 1 2 3 0.4 NA 1.1 # but not on factors (I loose the factor labels): > choose.last(data[,2]) 1 2 3 3 1 1 # moreover, if I apply this function to the whole data.frame # the output is a character matrix > apply(data,2,choose.last) V1 V2 V3 1 "1" "red" "0.4" 2 "2" "blue" NA 3 "3" "blue" "1.1" # and if I sapply, I loose factors labels > sapply(data,choose.last) V1 V2 V3 1 1 3 0.4 2 2 1 NA 3 3 1 1.1 any hint? Thanks in advance, Patrizio +------------------------------------------------- | Patrizio Frederic, PhD | Research associate in Statistics, | Department of Economics, | University of Modena and Reggio Emilia, | Via Berengario 51, | 41100 Modena, Italy | | tel: +39 059 205 6727 | fax: +39 059 205 6947 | mail: [EMAIL PROTECTED] +------------------------------------------------- ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.