Hi, thanks Dr. Mächler for your prompt response! I agree with your explanations about this issue. But I was thinking of something like adding an argument to median() and mean() that could keep the attributes of the variables if set to TRUE.
Thanks again. Best regards El mar, 4 may 2021 a las 17:57, Martin Maechler (<maech...@stat.math.ethz.ch>) escribió: > >>>>> Gustavo Zapata Wainberg > >>>>> on Mon, 3 May 2021 20:48:49 +0200 writes: > > > Hi! > > > I'm wrinting this post because there is an inconsistency > > when median() is calculated for even or odd vectors. For > > odd vectors, attributes (such as labels added with Hmisc) > > are kept after running median(), but this is not the case > > if the vector is even, in this last case attributes are > > lost. > > > I know that this is due to median() using mean() to obtain > > the result when the vector is even, and mean() always > > takes attributes off vectors. > > Yes, and this has been the design of median() for ever : > > If n := length(x) is odd, the median is "the middle" observation, > and should equal to x[j] for j = (n+1)/2 > and hence e.g., is well defined for an ordered factor. > > When n is even > however, median() must be the mean of "the two middle" observations, > which is e.g., not even *defined* for an ordered factor. > > We *could* talk of the so called lo-median or hi-median > (terms probably coined by John W. Tukey) because (IIRC), these > are equal to each other and to the median for odd n, but > are equal to x[j] and x[j+1] j=n/2 for even n *and* are > still "of the same kind" as x[] itself. > > Interestingly, for the mad() { = the median absolute deviation from the > median} > we *do* allow to specify logical 'low' and 'high', > but that for the "outer" median in MAD's definition, not the > inner one. > > ## From <Rsrc>/src/library/stats/R/mad.R : > > mad <- function(x, center = median(x), constant = 1.4826, > na.rm = FALSE, low = FALSE, high = FALSE) > { > if(na.rm) > x <- x[!is.na(x)] > n <- length(x) > constant * > if((low || high) && n%%2 == 0) { > if(low && high) stop("'low' and 'high' cannot be both TRUE") > n2 <- n %/% 2 + as.integer(high) > sort(abs(x - center), partial = n2)[n2] > } > else median(abs(x - center)) > } > > > > > > Don't you think that attributes should be kept in both > > cases? > > well, not all attributes can be kept. > Note that for *named* vectors x, x[j] can (and does) keep the name, > but there's definitely no sensible name to give to (x[j] + x[j+1])/2 > > I'm willing to collaborate with some, considering > to extend median.default() making hi-median and lo-median > available to the user. > Both of these will always return x[j] for some j and hence keep > all (sensible!) attributes (well, if the `[`-method for the > corresponding class has been defined correctly; I've encountered > quite a few cases where people created vector-like classes but > did not provide a "correct" subsetting method (typically you > should make sure both a `[[` and `[` method works!). > > Best regards, > Martin > > Martin Maechler > ETH Zurich and R Core team > > > And, going further, shouldn't mean() keep > > attributes as well? I have looked in R's Bugzilla and I > > didn't find an entry related to this issue. > > > Please, let me know if you consider that this issue should > > be posted in R's bugzilla. > > > Here is an example with code. > > > rndvar <- rnorm(n = 100) > > > Hmisc::label(rndvar) <- "A label for RNDVAR" > > > str(median(rndvar[-c(1,2)])) > > > Returns: "num 0.0368" > > > str(median(rndvar[-1])) > > > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A > > label for RNDVAR" > > > Thanks in advance! > > > Gustavo Zapata-Wainberg > > > [[alternative HTML version deleted]] > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel