m.craw...@imperial.ac.uk wrote:
In a Box and Whisker plot, I thought that when there are outliers both abov=
e and below the whiskers, then the whiskers should both be the same length =
(plus or minus 1.5 times the inter-quartile range).
Not according to the docs:
range: this determines how far the plot whiskers extend out from the
box. If 'range' is positive, the whiskers extend to the most
extreme data point which is no more than 'range' times the
interquartile range from the box. A value of zero causes the
whiskers to extend to the data extremes.
And the code itself has
stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
So the whisker won't be equal to 1.5 IQR unless there happens to be an
observation there.
Now, this might be wrong, but people have tried very hard to make the
implementation follow the original definition due to Tukey. I.e., if you
can point out that Tukey specified it otherwise, then we'd change it,
otherwise it is just not a bug.
If you look at the plot for SilwoodWeather on p.155 of The R Book you will =
see that for November (month =3D 11) the upper whisker is shorter than the =
lower, while for other months with outliers both above and below, the lines=
are the same lengths.
For easier reproduction (reproducible examples should not refer to files
on your C: drive...):
> diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
[,1]
[1,] 1.2525857
[2,] 0.5412128
[3,] 0.6083348
[4,] 1.4625057
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel