On May 13, 2010, at 10:25 AM, Robert Baer wrote:
Hi Peter,
You're absolutely correct! The description for 'range' in
'boxplot' help file is a little bit confusing by using the words
"interquartile range". I think it should be changed to the "length
of the box" to be exact and consistent with those in the help file
for "boxplot.stats".
The issue is probably that there are multiple ways (9 to be exact)
of defining quantiles in R. See 'type= ' arguement for ?quantile.
The quantile function uses type=7 by default which matches the
quantile definition used by S-Plus(?), but differs from that used by
SPSS. Doesn't fivenum essentially use the equivalent of a different
"type= " arguement (maybe 2 or 5) in constructing the hinges?
It seems perfectly reasonable to talk about 'length of box' (or 'box
height' depending how you display the boxplot), but aren't the
hinges simply Q1 and Q3 defined by one of the possible quartile
definitions (as Peter points out the one used by fivenum)? The box
height does not necesarily match the distance produced by IQR()
which also seems to use the equivalent of quantile(..., type=7), but
it is still an IQR, is it not?
Quantiles apparantly can be defined in more than one "acceptable"
way (sort of like dealing with ties in rank statistics). The OP
seemed to want an "exact" explanation of the wiskers, and I think
Peter has pointed us at the definition of quartiles used by fivenum,
as opposed to the default used with quantile(..., "type=7").
Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4] with
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove
that to myself from mathematical arguments. since I do not quite
understand the formalism in the quantile page. If the match is not
exact, this would be a tenth definition of IQR.
> set.seed(123)
> y <- rexp(300, .02)
> fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
> for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943
--
David.
All that said, I'm not convinced that it is wrong to speak of
"interquartile range" in 'boxplot' help.
Rob
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.