On May 13, 2010, at 10:25 AM, Robert Baer wrote:

Hi Peter,

You're absolutely correct! The description for 'range' in 'boxplot' help file is a little bit confusing by using the words "interquartile range". I think it should be changed to the "length of the box" to be exact and consistent with those in the help file for "boxplot.stats".

The issue is probably that there are multiple ways (9 to be exact) of defining quantiles in R. See 'type= ' arguement for ?quantile. The quantile function uses type=7 by default which matches the quantile definition used by S-Plus(?), but differs from that used by SPSS. Doesn't fivenum essentially use the equivalent of a different "type= " arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or 'box height' depending how you display the boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the possible quartile definitions (as Peter points out the one used by fivenum)? The box height does not necesarily match the distance produced by IQR() which also seems to use the equivalent of quantile(..., type=7), but it is still an IQR, is it not?

Quantiles apparantly can be defined in more than one "acceptable" way (sort of like dealing with ties in rank statistics). The OP seemed to want an "exact" explanation of the wiskers, and I think Peter has pointed us at the definition of quartiles used by fivenum, as opposed to the default used with quantile(..., "type=7").

Yes, and experimentation leads me to the conclusion that the only possible candidate for matching up the results of fivenum[c(2,4] with quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from mathematical arguments. since I do not quite understand the formalism in the quantile page. If the match is not exact, this would be a tenth definition of IQR.

> set.seed(123)
>  y <- rexp(300, .02)
> fivenum(y)
[1]   0.2183685  15.8740466  42.1147820  74.0362517 360.5503788
> for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
     25%      75%
15.82506 73.93080
     25%      75%
15.87405 74.03625
     25%      75%
15.84955 74.08898
     25%      75%
15.89854 73.98352
     25%      75%
15.86588 74.05383
     25%      75%
15.86792 74.04943

--
David.


All that said, I'm not convinced that it is wrong to speak of "interquartile range" in 'boxplot' help.

Rob

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to