Re: [R] Whiskers on the default boxplot {graphics}

2010-05-14 Thread Peter Ehlers

Just to put this topic to rest:

The hinges match quantile(x, probs = c(1,3)/4, type = 2) except when n
= 3 mod 4.

I no longer have Tukey's EDA book, but I think that his idea was that
hinges (aka quartiles) were defined as medians of the lower/upper
halves of the (sorted, of course) data, where a 'half' would include
the median for odd sample sizes. And that's how they are calculated
in fivenum().

Thus hinges are a 10th definition of quartiles, but they don't lend
themselves to generalization to arbitrary quantiles other than, say,
octiles or other (1/2^k)-iles.


On 2010-05-13 11:47, David Winsemius wrote:

I agree. I was convinced by Ehlers' example that type =2 was a better
match to fivenum's result

Peter Ehlers
University of Calgary

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Shi, Tao
Hi Robert,

Your points are well taken.  However, I reserve mine, b/c I think without this 
detailed discussion, an average R user would simply confused the "interquartile 
range" said in boxplot help file with the results of "IQR".  Changing it to 
"length of box" makes it more exact and consistent, as I stated earlier.  With 
all these being said, this is up to the R core team to decide.


- Original Message 
> From: Robert Baer 
> To: "Shi, Tao" ; Peter Ehlers 
> Cc: R Project Help 
> Sent: Thu, May 13, 2010 7:25:09 AM
> Subject: Re: [R] Whiskers on the default boxplot {graphics}
> > Hi Peter,
> You're absolutely correct!  The description 
> for 'range' in 'boxplot' help file is a little bit confusing by using the 
> words 
> "interquartile range". I think it should be changed to the "length of the 
> box" 
> to be exact and consistent with those in the help file for 
> "boxplot.stats".

The issue is probably that there are multiple ways (9 to 
> be exact) of defining quantiles in R.  See 'type= ' arguement for 
> ?quantile.  The quantile function uses type=7 by default which matches the 
> quantile definition used by S-Plus(?), but differs from that used by SPSS.  
> Doesn't fivenum essentially use the equivalent of a different "type= " 
> arguement 
> (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable 
> to talk about 'length of box' (or 'box height' depending how you display the 
> boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the 
> possible 
> quartile definitions (as Peter points out the one used by fivenum)?  The 
> box height does not necesarily match the distance produced by IQR() which 
> also 
> seems to use the equivalent of quantile(..., type=7), but it is still an IQR, 
> is 
> it not?

Quantiles apparantly can be defined in more than one "acceptable" 
> way (sort of like dealing with ties in rank statistics).  The OP seemed to 
> want an "exact" explanation of the wiskers, and I think Peter has pointed us 
> at 
> the definition of quartiles used by fivenum, as opposed to the default  
> used with quantile(..., "type=7").

All that said, I'm not convinced that 
> it is wrong to speak of "interquartile range" in 'boxplot' 
> help.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread David Winsemius
I agree. I was convinced by Ehlers' example that type =2 was a better  
match to fivenum's result


On May 13, 2010, at 1:36 PM, Joshua Wiley wrote:

On Thu, May 13, 2010 at 7:55 AM, David Winsemius > wrote:
Yes, and experimentation leads me to the conclusion that the only  
candidate for matching up the results of fivenum[c(2,4]  with  
c(1,3)/4, type=i) is for type=5. I'm not able to prove that to  
myself from
mathematical arguments. since I do not quite understand the  
formalism in the
quantile page. If the match is not exact, this would be a tenth  

of IQR.


Here is some sample data, and the most parsimonious code I could come
up with for how quantile() computes the quartiles when using type=5.
The code for fivenum() seems simple enough, but I am not quite able to
make enough sense of the code for type=5 from quantile() to say
confidently why they are different.

I am open to the possibility that my attempts to extract relevant code
from quantile were flawed, but my tentative conclusion is that
quantile(x, type=5) != fivenum(x).

x <- c(0.643796386452606, -0.605277531056206, -0.339239367816402,
1.12408365699422, 0.615753476531243, -1.10545696568758,
0.666533406841698, 1.42794492209271, 0.624752921945051,
2.02317205214712, -0.365586657432646, 0.821742701084307,
-0.874753498321076, -0.0298783402061118, 1.18037670706428,
-0.178274986836195, 0.308703365439049, 0.619700844646392,
0.54977981430092, -1.82161514610448, -1.28413556650749,
-0.0443852992196351, 0.704196760556652, -1.88596816676741,
oldx <- x #this is just a backup because x will be transformed

##Start from quantile()
probs <- c(0, 0.25, 0.5, 0.75, 1)
type <- 5
n <- length(x)
switch(type - 3, {
 a <- 0
 b <- 1
}, a <- b <- 0.5, a <- b <- 0, a <- b <- 1, a <- b <- 1/3, a <- b <-  

fuzz <- 4 * .Machine$double.eps
nppm <- a + probs * (n + 1 - a - b)
j <- floor(nppm + fuzz)
h <- nppm - j
h <- ifelse(abs(h) < fuzz, 0, h)
x <- sort(x, partial = unique(c(1, j[j > 0L & j <= n], (j + 1)[j > 0L
& j < n], n)))
x <- c(x[1L], x[1L], x, x[n], x[n])
qs <- x[j + 2L]
qs[h == 1] <- x[j + 3L][h == 1]
other <- (h > 0) && (h < 1)
if (any(other)) qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L]) 

##End from quantile

qs # from the calculations above
quantile(oldx, type=5) #this should match qs
fivenum(oldx) #the 25% does not match


David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Joshua Wiley
On Thu, May 13, 2010 at 7:55 AM, David Winsemius  wrote:
> Yes, and experimentation leads me to the conclusion that the only possible
> candidate for matching up the results of fivenum[c(2,4]  with quantile(y,
> c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from
> mathematical arguments. since I do not quite understand the formalism in the
> quantile page. If the match is not exact, this would be a tenth definition
> of IQR.


Here is some sample data, and the most parsimonious code I could come
up with for how quantile() computes the quartiles when using type=5.
The code for fivenum() seems simple enough, but I am not quite able to
make enough sense of the code for type=5 from quantile() to say
confidently why they are different.

I am open to the possibility that my attempts to extract relevant code
from quantile were flawed, but my tentative conclusion is that
quantile(x, type=5) != fivenum(x).

x <- c(0.643796386452606, -0.605277531056206, -0.339239367816402,
1.12408365699422, 0.615753476531243, -1.10545696568758,
0.666533406841698, 1.42794492209271, 0.624752921945051,
2.02317205214712, -0.365586657432646, 0.821742701084307,
-0.874753498321076, -0.0298783402061118, 1.18037670706428,
-0.178274986836195, 0.308703365439049, 0.619700844646392,
0.54977981430092, -1.82161514610448, -1.28413556650749,
-0.0443852992196351, 0.704196760556652, -1.88596816676741,
oldx <- x #this is just a backup because x will be transformed

##Start from quantile()
probs <- c(0, 0.25, 0.5, 0.75, 1)
type <- 5
n <- length(x)
switch(type - 3, {
  a <- 0
  b <- 1
}, a <- b <- 0.5, a <- b <- 0, a <- b <- 1, a <- b <- 1/3, a <- b <- 3/8)
fuzz <- 4 * .Machine$double.eps
nppm <- a + probs * (n + 1 - a - b)
j <- floor(nppm + fuzz)
h <- nppm - j
h <- ifelse(abs(h) < fuzz, 0, h)
x <- sort(x, partial = unique(c(1, j[j > 0L & j <= n], (j + 1)[j > 0L
& j < n], n)))
x <- c(x[1L], x[1L], x, x[n], x[n])
qs <- x[j + 2L]
qs[h == 1] <- x[j + 3L][h == 1]
other <- (h > 0) && (h < 1)
if (any(other)) qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L])[other]
##End from quantile

qs # from the calculations above
quantile(oldx, type=5) #this should match qs
fivenum(oldx) #the 25% does not match


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread David Winsemius

On May 13, 2010, at 12:18 PM, Robert Baer wrote:

And try this (which seems to leave us with type=2) and is listed in ? 
quantile as "Discontinuous sample quantile types 1, 2, and 3"

quantile(1:101, c(1,3)/4, type=2)

25% 75%
26  76

I think Peter may be right,. If I do it with the rnorm function I  
repeatedly get the same result for fivenum[2] and the type 2 first  
quartile. I did not test those types because they were designed for  
discrete values variables, but I suppose everything is really discrete  
on computers, eh?

> fivenum(x <- rnorm(101) )
[1] -2.6224338 -0.9682586 -0.1897377  0.5999332  2.5409711
> quantile(x, c(1,3)/4, type=2)
-0.9682586  0.5999332

> fivenum(x <- rnorm(101) )
[1] -3.8251928 -0.6495966  0.1816233  0.7101774  2.3789054
> quantile(x, c(1,3)/4, type=2)
-0.6495966  0.7101774



try this:

quantile(1:101, c(1,3)/4, type=5)


On 2010-05-13 8:55, David Winsemius wrote:

On May 13, 2010, at 10:25 AM, Robert Baer wrote:

Hi Peter,

You're absolutely correct! The description for 'range' in  
help file is a little bit confusing by using the words  
range". I think it should be changed to the "length of the box"  
to be
exact and consistent with those in the help file for  

The issue is probably that there are multiple ways (9 to be  
exact) of

defining quantiles in R. See 'type= ' arguement for ?quantile. The
quantile function uses type=7 by default which matches the quantile
definition used by S-Plus(?), but differs from that used by SPSS.
Doesn't fivenum essentially use the equivalent of a different  
"type= "

arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or  
height' depending how you display the boxplot), but aren't the  
simply Q1 and Q3 defined by one of the possible quartile  
(as Peter points out the one used by fivenum)? The box height  
does not

necesarily match the distance produced by IQR() which also seems to
use the equivalent of quantile(..., type=7), but it is still an  

is it not?

Quantiles apparantly can be defined in more than one "acceptable"  
(sort of like dealing with ties in rank statistics). The OP  
seemed to

want an "exact" explanation of the wiskers, and I think Peter has
pointed us at the definition of quartiles used by fivenum, as  

to the default used with quantile(..., "type=7").

Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4]  
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove  
to myself from mathematical arguments. since I do not quite  

the formalism in the quantile page. If the match is not exact, this
would be a tenth definition of IQR.

> set.seed(123)
> y <- rexp(300, .02)
> fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
> for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943

Peter Ehlers
University of Calgary

David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Robert Baer
And try this (which seems to leave us with type=2) and is listed in 
?quantile as "Discontinuous sample quantile types 1, 2, and 3"

quantile(1:101, c(1,3)/4, type=2)

25% 75%
26  76


try this:

quantile(1:101, c(1,3)/4, type=5)


On 2010-05-13 8:55, David Winsemius wrote:

On May 13, 2010, at 10:25 AM, Robert Baer wrote:

Hi Peter,

You're absolutely correct! The description for 'range' in 'boxplot'
help file is a little bit confusing by using the words "interquartile
range". I think it should be changed to the "length of the box" to be
exact and consistent with those in the help file for "boxplot.stats".

The issue is probably that there are multiple ways (9 to be exact) of
defining quantiles in R. See 'type= ' arguement for ?quantile. The
quantile function uses type=7 by default which matches the quantile
definition used by S-Plus(?), but differs from that used by SPSS.
Doesn't fivenum essentially use the equivalent of a different "type= "
arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or 'box
height' depending how you display the boxplot), but aren't the hinges
simply Q1 and Q3 defined by one of the possible quartile definitions
(as Peter points out the one used by fivenum)? The box height does not
necesarily match the distance produced by IQR() which also seems to
use the equivalent of quantile(..., type=7), but it is still an IQR,
is it not?

Quantiles apparantly can be defined in more than one "acceptable" way
(sort of like dealing with ties in rank statistics). The OP seemed to
want an "exact" explanation of the wiskers, and I think Peter has
pointed us at the definition of quartiles used by fivenum, as opposed
to the default used with quantile(..., "type=7").

Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4] with
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that
to myself from mathematical arguments. since I do not quite understand
the formalism in the quantile page. If the match is not exact, this
would be a tenth definition of IQR.

 > set.seed(123)
 > y <- rexp(300, .02)
 > fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
 > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943

Peter Ehlers
University of Calgary

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Peter Ehlers


try this:

quantile(1:101, c(1,3)/4, type=5)


On 2010-05-13 8:55, David Winsemius wrote:

On May 13, 2010, at 10:25 AM, Robert Baer wrote:

Hi Peter,

You're absolutely correct! The description for 'range' in 'boxplot'
help file is a little bit confusing by using the words "interquartile
range". I think it should be changed to the "length of the box" to be
exact and consistent with those in the help file for "boxplot.stats".

The issue is probably that there are multiple ways (9 to be exact) of
defining quantiles in R. See 'type= ' arguement for ?quantile. The
quantile function uses type=7 by default which matches the quantile
definition used by S-Plus(?), but differs from that used by SPSS.
Doesn't fivenum essentially use the equivalent of a different "type= "
arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or 'box
height' depending how you display the boxplot), but aren't the hinges
simply Q1 and Q3 defined by one of the possible quartile definitions
(as Peter points out the one used by fivenum)? The box height does not
necesarily match the distance produced by IQR() which also seems to
use the equivalent of quantile(..., type=7), but it is still an IQR,
is it not?

Quantiles apparantly can be defined in more than one "acceptable" way
(sort of like dealing with ties in rank statistics). The OP seemed to
want an "exact" explanation of the wiskers, and I think Peter has
pointed us at the definition of quartiles used by fivenum, as opposed
to the default used with quantile(..., "type=7").

Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4] with
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that
to myself from mathematical arguments. since I do not quite understand
the formalism in the quantile page. If the match is not exact, this
would be a tenth definition of IQR.

 > set.seed(123)
 > y <- rexp(300, .02)
 > fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
 > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943

Peter Ehlers
University of Calgary

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread David Winsemius

On May 13, 2010, at 10:25 AM, Robert Baer wrote:

Hi Peter,

You're absolutely correct!  The description for 'range' in  
'boxplot' help file is a little bit confusing by using the words  
"interquartile range". I think it should be changed to the "length  
of the box" to be exact and consistent with those in the help file  
for "boxplot.stats".

The issue is probably that there are multiple ways (9 to be exact)  
of defining quantiles in R.  See 'type= ' arguement for ?quantile.   
The quantile function uses type=7 by default which matches the  
quantile definition used by S-Plus(?), but differs from that used by  
SPSS.  Doesn't fivenum essentially use the equivalent of a different  
"type= " arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or 'box  
height' depending how you display the boxplot), but aren't the  
hinges simply Q1 and Q3 defined by one of the possible quartile  
definitions (as Peter points out the one used by fivenum)?  The box  
height does not necesarily match the distance produced by IQR()  
which also seems to use the equivalent of quantile(..., type=7), but  
it is still an IQR, is it not?

Quantiles apparantly can be defined in more than one "acceptable"  
way (sort of like dealing with ties in rank statistics).  The OP  
seemed to want an "exact" explanation of the wiskers, and I think  
Peter has pointed us at the definition of quartiles used by fivenum,  
as opposed to the default  used with quantile(..., "type=7").

Yes, and experimentation leads me to the conclusion that the only  
possible candidate for matching up the results of fivenum[c(2,4]  with  
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove  
that to myself from mathematical arguments. since I do not quite  
understand the formalism in the quantile page. If the match is not  
exact, this would be a tenth definition of IQR.

> set.seed(123)
>  y <- rexp(300, .02)
> fivenum(y)
[1]   0.2183685  15.8740466  42.1147820  74.0362517 360.5503788
> for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
 25%  75%
15.82506 73.93080
 25%  75%
15.87405 74.03625
 25%  75%
15.84955 74.08898
 25%  75%
15.89854 73.98352
 25%  75%
15.86588 74.05383
 25%  75%
15.86792 74.04943


All that said, I'm not convinced that it is wrong to speak of  
"interquartile range" in 'boxplot' help.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Robert Baer

Hi Peter,

You're absolutely correct!  The description for 'range' in 'boxplot' help 
file is a little bit confusing by using the words "interquartile range". 
I think it should be changed to the "length of the box" to be exact and 
consistent with those in the help file for "boxplot.stats".

The issue is probably that there are multiple ways (9 to be exact) of 
defining quantiles in R.  See 'type= ' arguement for ?quantile.  The 
quantile function uses type=7 by default which matches the quantile 
definition used by S-Plus(?), but differs from that used by SPSS.  Doesn't 
fivenum essentially use the equivalent of a different "type= " arguement 
(maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or 'box height' 
depending how you display the boxplot), but aren't the hinges simply Q1 and 
Q3 defined by one of the possible quartile definitions (as Peter points out 
the one used by fivenum)?  The box height does not necesarily match the 
distance produced by IQR() which also seems to use the equivalent of 
quantile(..., type=7), but it is still an IQR, is it not?

Quantiles apparantly can be defined in more than one "acceptable" way (sort 
of like dealing with ties in rank statistics).  The OP seemed to want an 
"exact" explanation of the wiskers, and I think Peter has pointed us at the 
definition of quartiles used by fivenum, as opposed to the default  used 
with quantile(..., "type=7").

All that said, I'm not convinced that it is wrong to speak of "interquartile 
range" in 'boxplot' help.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Shi, Tao
Hi Peter,

You're absolutely correct!  The description for 'range' in 'boxplot' help file 
is a little bit confusing by using the words "interquartile range".  I think it 
should be changed to the "length of the box" to be exact and consistent with 
those in the help file for "boxplot.stats".


- Original Message 
> From: Peter Ehlers 
> To: "Shi, Tao" 
> Cc: Jason Rupert ; Dennis Murphy ; 
> R Project Help ;
> Sent: Wed, May 12, 2010 2:11:24 PM
> Subject: Re: [R] Whiskers on the default boxplot {graphics}
> On 2010-05-12 13:27, Shi, Tao wrote:
> Jason,
> All these 
> are clearly defined in the help file for 'boxplot' under 'range'.  Don't 
> understand how you missed that.
> ...Tao

> made me re-read the help page for boxplot. I notice that
there's a difference 
> in the description of 'range' on that page
and the description of the 
> equivalent 'coef' on the help page
for boxplot.stats. boxplot.stats has it 
> right.

This should be made consistent.

[previous posts 
> snipped]
Peter Ehlers
University of Calgary

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Peter Ehlers

On 2010-05-12 13:27, Shi, Tao wrote:


All these are clearly defined in the help file for 'boxplot' under 'range'.  
Don't understand how you missed that.


You've made me re-read the help page for boxplot. I notice that
there's a difference in the description of 'range' on that page
and the description of the equivalent 'coef' on the help page
for boxplot.stats. boxplot.stats has it right.

This should be made consistent.

[previous posts snipped]
Peter Ehlers
University of Calgary

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Shi, Tao

All these are clearly defined in the help file for 'boxplot' under 'range'.  
Don't understand how you missed that.


- Original Message 
> From: Jason Rupert 
> To: Dennis Murphy 
> Cc: R Project Help 
> Sent: Wed, May 12, 2010 3:40:12 AM
> Subject: Re: [R] Whiskers on the default boxplot {graphics}
> Fantastic! 

It would be great if the description could be modified to 
> include the mysterious bit about the upper and lower bound whisker 
> positions:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker 
> = max(min(x), Q_1 - 1.5 * IQR)

Maybe that is clearly written in the 
> description of boxplot.stats {grDevices}, but evidently I missed it numerous 
> times and also did not pick up on this intent from the original description 
> of 
> boxplot {graphics}.  

Your type of descriptive answer and 
> helpfulness is much appreciated and one of the reasons I continue to endorse 
> the 
> R tool over numerous others.  

More like you and the tool may be 
> headed for domination in the market. 

> again!

> Dennis Murphy <
> href="";>>

Cc: R Project 
> Help <
> href="";>>
Sent: Wed, 
> May 12, 2010 2:50:19 AM
Subject: Re: [R] Whiskers on the default boxplot 
> {graphics}


Let's do some math 
> :)


Okay...Let me see if I've got 
> it...
>>I'm just trying to use the default boxplot {graphics} 
> capability in R...
>>So I call something like the 
> following:
>>> boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", 
> xlab="Number of Cylinders", ylab="Miles Per Gallon") \
> produces something as shown in the 
> following:
> that default boxplot is called, i.e. boxplot {graphics}, as shown in the line 
> of 
> code above, it is actually calling into boxplot.stats {grDevices}.  When 
> boxplot.stats {grDevices} is called it has a default value for "coef" of 1.5, 
> i.e. coef = 1.5.
>>If I understand the purpose of "coef" 
> correctly, it means that the ‘whiskers’ should extend out 1.5 times the 
> length 
> of the box away from the box.   Is that correct?

If by 
> 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 where Q 
> refers to quartile), then assuming that
x is the numeric vector of interest 
> for a boxplot,

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
> whisker = max(min(x), Q_1 - 1.5 * IQR)

So the upper whisker is located at 
> the *smaller* of the maximum x value and Q_3 + 1.5 IQR,
whereas the lower 
> whisker is located at the *larger* of the smallest x value and Q_1 - 1.5 
> IQR.

In your terms, the whiskers should extend out a *maximum* of "1.5 
> times the length of the box
away from the box". 

Visually, this means 
> that individual points more extreme in value than Q3 + 1.5 IQR are 
> plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted 
> separately on the low
end. Depending on the source, the separately plotted 
> points are called 'outside values'. On
the other hand, if the maximum or 
> minimum values of x are closer than 1.5 IQR in distance from
its nearest 
> quartile, then that is where the whisker is positioned.

Does that make 
> sense?


>>Now I look back at the plot, and 
> I'm not sure how 1.5 times the length of the box corresponds with the whisker 
> lengths shown in the image:
> href=""; target=_blank 
> >
> it that the whisker length is a total of 1.5 the length of the box and 
> centered 
> about the median (2nd Quartile)?
>>Just trying to get a handle 
> on this, so thanks again for all the help in deciphering 
> this.
> RJ Cunningham <
> href="";>>
> target="_blank" href="";>>
>>Cc: R Project 
> Help <
> href="";>>
> Tue, May 11, 2010 9:57:48 PM
>Subject: Re: [R] Whiskers on the 
> default boxplot {graphics}
>I think not. Isn't the 
> "secret" here?
>>x: a 
> numeric vector fo

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Peter Ehlers

On 2010-05-12 10:51, Robert Baer wrote:

- Original Message - Fantastic!

It would be great if the description could be modified to include the
mysterious bit about the upper and lower bound whisker positions:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

-- snip --
NOT quite!

The boxplot.stats help reads under the coef argument:
"... the whiskers extend to the most extreme data point which is no more
than coef times the length of the box away from the box."

If there are outliers, and the most extreme data point within 1.5 *IQR
of Q1 or Q3 is less than 1.5 IQRs, and the wisker may "end earlier" than
1.5*IQR, but the data point at which it ends may NOT be max(x) or min(x).

But even this is not quite correct.
The help page (quoted above) is, as is so often the case,
quite precise: the *length of the box* is multiplied by 1.5,
not the *IQR*. The difference is probably insignificant in most
applications, but then this question was about the precise
definition of the whiskers.

The box length is defined by the hinges, for whose definition
it's probably easiest to look at the code in fivenum() which
is used by boxplot.stats(). (The relevant code consists of three
short lines.) For the calculation of the whisker extremes, one
can peruse the boxplot.stats() code, which also is quite brief.
Essentially, it determines which observations lie outside the
boundaries established by (lower hinge - 1.5 * boxlength) and
(upper hinge + 1.5 * boxlength) and then uses the range of
the remaining data values to determine the whisker extremes.

(I've assumed the default value of coef=1.5).

Here's an example:

  y <- rexp(30, .02)
  y <- sort(round(y))

#[1]   3  22  38  61 221

#[1]   3  22  38  61 118

# The hinges are 22, 61;
# The whisker extremes are 3, 118;

  quantile(y, c(1,3)/4)
#  25%   75%
#23.25 60.50

# The hinges do not equal the quartiles.

# Upper cut-off ('fence'):
  61 + 1.5 * (61 - 22)
#[1] 119.5

#[1]  70  94 118 145 198 221

# So 118 is the largest data value less than or equal to 119.5.

  60.5 + 1.5 * IQR(y)
#[1] 116.375

# Using quartiles and the IQR would take the upper whisker to 94.

Peter Ehlers

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Dennis Murphy

Point well taken, Robert. This is a good example of the difference between
something is defined mathematically as opposed to how it is applied
Thank you for the clarification.


On Wed, May 12, 2010 at 9:51 AM, Robert Baer  wrote:

> - Original Message - Fantastic!
> It would be great if the description could be modified to include the
> mysterious bit about the upper and lower bound whisker positions:
> upper whisker = min(max(x), Q_3 + 1.5 * IQR)
> lower whisker = max(min(x), Q_1 - 1.5 * IQR)
> -- snip --
> --
> NOT quite!
> The boxplot.stats help reads under the coef argument:
> "... the whiskers extend to the most extreme data point which is no more
> than coef times the length of the box away from the box."
> If there are outliers, and the most extreme data point within 1.5 *IQR of
> Q1 or Q3 is less than 1.5 IQRs, and the wisker may "end earlier" than
> 1.5*IQR, but the data point at which it ends may NOT be max(x) or min(x).

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Robert Baer

- Original Message - 

It would be great if the description could be modified to include the 
mysterious bit about the upper and lower bound whisker positions:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

-- snip --
NOT quite!

The boxplot.stats help reads under the coef argument:
"... the whiskers extend to the most extreme data point which is no more 
than coef times the length of the box away from the box."

If there are outliers, and the most extreme data point within 1.5 *IQR of Q1 
or Q3 is less than 1.5 IQRs, and the wisker may "end earlier" than 1.5*IQR, 
but the data point at which it ends may NOT be max(x) or min(x).

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Jason Rupert

It would be great if the description could be modified to include the 
mysterious bit about the upper and lower bound whisker positions:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

Maybe that is clearly written in the description of boxplot.stats {grDevices}, 
but evidently I missed it numerous times and also did not pick up on this 
intent from the original description of boxplot {graphics}.  

Your type of descriptive answer and helpfulness is much appreciated and one of 
the reasons I continue to endorse the R tool over numerous others.   

More like you and the tool may be headed for domination in the market. 

Thanks again!

From: Dennis Murphy 

Cc: R Project Help 
Sent: Wed, May 12, 2010 2:50:19 AM
Subject: Re: [R] Whiskers on the default boxplot {graphics}


Let's do some math :)


Okay...Let me see if I've got it...
>>I'm just trying to use the default boxplot {graphics} capability in R...
>>So I call something like the following:
>>> boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of 
>>> Cylinders", ylab="Miles Per Gallon") \
>>That produces something as shown in the following:
>>When that default boxplot is called, i.e. boxplot {graphics}, as shown in the 
>>line of code above, it is actually calling into boxplot.stats {grDevices}.  
>>When boxplot.stats {grDevices} is called it has a default value for "coef" of 
>>1.5, i.e. coef = 1.5.
>>If I understand the purpose of "coef" correctly, it means that the 
>>‘whiskers’ should extend out 1.5 times the length of the box away from 
>>the box.   Is that correct?

If by 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 
where Q refers to quartile), then assuming that
x is the numeric vector of interest for a boxplot,

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

So the upper whisker is located at the *smaller* of the maximum x value and Q_3 
+ 1.5 IQR,
whereas the lower whisker is located at the *larger* of the smallest x value 
and Q_1 - 1.5 IQR.

In your terms, the whiskers should extend out a *maximum* of "1.5 times the 
length of the box
away from the box". 

Visually, this means that individual points more extreme in value than Q3 + 1.5 
IQR are plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted separately 
on the low
end. Depending on the source, the separately plotted points are called 'outside 
values'. On
the other hand, if the maximum or minimum values of x are closer than 1.5 IQR 
in distance from
its nearest quartile, then that is where the whisker is positioned.

Does that make sense?


>>Now I look back at the plot, and I'm not sure how 1.5 times the length of the 
>>box corresponds with the whisker lengths shown in the image:
>>Is it that the whisker length is a total of 1.5 the length of the box and 
>>centered about the median (2nd Quartile)?
>>Just trying to get a handle on this, so thanks again for all the help in 
>>deciphering this.
>>From: RJ Cunningham 
>>Cc: R Project Help 
>>Sent: Tue, May 11, 2010 9:57:48 PM
>Subject: Re: [R] Whiskers on the default boxplot {graphics}
>I think not. Isn't the "secret" here?
>>x: a numeric vector for which the boxplot will be constructed
>>('NA's and 'NaN's are allowed and omitted).
>>coef: this determines how far the plot 'whiskers' extend out
>>from the box.  If 'coef' is positive, the whiskers extend
>>to the most extreme data point which is no more than
>>'coef' times the length of the box away from the box. A
>>value of zero causes the whiskers to extend to the data
>>extremes (and no outliers be returned).
>>do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
>>component respectively will be empty in the result.
>>The two 'hinges' are versions of the first and third quartile,...
>>On Wed May 12 10:35 , Jason Rupert  sent:
>>HummMaybe I need to look some place else than boxplot.stats {grDevices} 
>>for a definition of how the upper/lower whiskers are produced.
>>>By any chance are they "the lowest datum still within 1.5 IQR of t

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Dennis Murphy

Let's do some math :)

On Tue, May 11, 2010 at 8:55 PM, Jason Rupert wrote:

> Okay...Let me see if I've got it...
> I'm just trying to use the default boxplot {graphics} capability in R...
> So I call something like the following:
> > boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of
> Cylinders", ylab="Miles Per Gallon") \
> That produces something as shown in the following:
> When that default boxplot is called, i.e. boxplot {graphics}, as shown in
> the line of code above, it is actually calling into boxplot.stats
> {grDevices}.  When boxplot.stats {grDevices} is called it has a default
> value for "coef" of 1.5, i.e. coef = 1.5.
> If I understand the purpose of "coef" correctly, it means that the
> ‘whiskers’ should extend out 1.5 times the length of the box away from the
> box.   Is that correct?

If by 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1
where Q refers to quartile), then assuming that
x is the numeric vector of interest for a boxplot,

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

So the upper whisker is located at the *smaller* of the maximum x value and
Q_3 + 1.5 IQR,
whereas the lower whisker is located at the *larger* of the smallest x value
and Q_1 - 1.5 IQR.

In your terms, the whiskers should extend out a *maximum* of "1.5 times the
length of the box
away from the box".

Visually, this means that individual points more extreme in value than Q3 +
1.5 IQR are plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted
separately on the low
end. Depending on the source, the separately plotted points are called
'outside values'. On
the other hand, if the maximum or minimum values of x are closer than 1.5
IQR in distance from
its nearest quartile, then that is where the whisker is positioned.

Does that make sense?


> Now I look back at the plot, and I'm not sure how 1.5 times the length of
> the box corresponds with the whisker lengths shown in the image:
> Is it that the whisker length is a total of 1.5 the length of the box and
> centered about the median (2nd Quartile)?
> Just trying to get a handle on this, so thanks again for all the help in
> deciphering this.
> From: RJ Cunningham 
> Cc: R Project Help 
> Sent: Tue, May 11, 2010 9:57:48 PM
> Subject: Re: [R] Whiskers on the default boxplot {graphics}
> I think not. Isn't the "secret" here?
> Arguments:
> x: a numeric vector for which the boxplot will be constructed
> ('NA's and 'NaN's are allowed and omitted).
> coef: this determines how far the plot 'whiskers' extend out
> from the box.  If 'coef' is positive, the whiskers extend
> to the most extreme data point which is no more than
> 'coef' times the length of the box away from the box. A
> value of zero causes the whiskers to extend to the data
> extremes (and no outliers be returned).
> do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
> component respectively will be empty in the result.
> Details:
> The two 'hinges' are versions of the first and third quartile,...
> On Wed May 12 10:35 , Jason Rupert  sent:
> HummMaybe I need to look some place else than boxplot.stats {grDevices}
> for a definition of how the upper/lower whiskers are produced.
> >
> >>
> >By any chance are they "the lowest datum still within 1.5 IQR of the lower
> quartile, and the highest datum still within 1.5 IQR of the upper quartile"?
> >
> >>
> >None of the links from boxplot.stats {grDevices} seemed to reveal the
> secret definition of the R whiskers.
> >
> >>
> >Thanks again.
> >
> >
> >
> >
> >
> >>
> >- Original Message 
> >>
> >>
> >To: David Winsemius 
> >>
> >Cc: R Project Help 
> >>
> >Sent: Tue, May 11, 2010 9:26:25 PM
> >>
> >Subject: Re: [R] Whiskers on the default boxplot {graphics}
> >
> >>
> >Wowzers...
> >
> >>
> >From ?boxplot.stats:
> >
> >>
> >Details
> >
> >>
> >The two ‘hinges’ are versions of the first and third quartile, i.e., close
> to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n
> <- 

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread David Winsemius

On May 11, 2010, at 11:55 PM, Jason Rupert wrote:

Okay...Let me see if I've got it...

I'm just trying to use the default boxplot {graphics} capability in  

So I call something like the following:
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number  
of Cylinders", ylab="Miles Per Gallon") \

That produces something as shown in the following:

When that default boxplot is called, i.e. boxplot {graphics}, as  
shown in the line of code above, it is actually calling into  
boxplot.stats {grDevices}.  When boxplot.stats {grDevices} is called  
it has a default value for "coef" of 1.5, i.e. coef = 1.5.

If I understand the purpose of "coef" correctly, it means that the  
‘whiskers’ should extend out 1.5 times the length of the box away  
from the box.   Is that correct?

No. Read it again.


Now I look back at the plot, and I'm not sure how 1.5 times the  
length of the box corresponds with the whisker lengths shown in the  

Is it that the whisker length is a total of 1.5 the length of the  
box and centered about the median (2nd Quartile)?

Just trying to get a handle on this, so thanks again for all the  
help in deciphering this.

From: RJ Cunningham>
Cc: R Project Help 
Sent: Tue, May 11, 2010 9:57:48 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

I think not. Isn't the "secret" here?


x: a numeric vector for which the boxplot will be constructed
('NA's and 'NaN's are allowed and omitted).

coef: this determines how far the plot 'whiskers' extend out
from the box.  If 'coef' is positive, the whiskers extend
to the most extreme data point which is no more than
'coef' times the length of the box away from the box. A
value of zero causes the whiskers to extend to the data
extremes (and no outliers be returned).

do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
component respectively will be empty in the result.


The two 'hinges' are versions of the first and third quartile,...

On Wed May 12 10:35 , Jason Rupert  sent:

HummMaybe I need to look some place else than boxplot.stats  
{grDevices} for a definition of how the upper/lower whiskers are  

By any chance are they "the lowest datum still within 1.5 IQR of  
the lower quartile, and the highest datum still within 1.5 IQR of  
the upper quartile"?

None of the links from boxplot.stats {grDevices} seemed to reveal  
the secret definition of the R whiskers.

Thanks again.

- Original Message 

To: David Winsemius 

Cc: R Project Help 

Sent: Tue, May 11, 2010 9:26:25 PM

Subject: Re: [R] Whiskers on the default boxplot {graphics}


From ?boxplot.stats:


The two ‘hinges’ are versions of the first and third quartile,  
i.e., close to quantile(x, c(1,3)/4). The hinges equal the  
quartiles for odd n (where n <- length(x)) and differ for even n.  
Whereas the quartiles only equal observations for n %% 4 == 1 (n =  
1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod  
4), and are in the middle of two observations otherwise.

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This  
seems to be based on the same calculations as the formula with 1.57  
in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p.  
16). They are based on asymptotic normality of the median and  
roughly equal sample sizes for the two medians being compared, and  
are said to be rather insensitive to the underlying distributions  
of the samples. The idea appears to be to give roughly a 95%  
confidence interval for the difference in two medians.

Is a notch equal to the upper/lower whisker?   Is this just a  
difference of terminology or something?

Thanks again for all the insights.

- Original Message 

From: David Winsemius 

Cc: R Project Help 

Sent: Tue, May 11, 2010 9:00:15 PM

Subject: Re: [R] Whiskers on the default boxplot {graphics}

On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

How are the lower/upper whiskers defined in the default version of  
boxplot {graphics}?

I tried help(boxplot) and searching, but I was  
unable to determine an absolute answer.

You need to follow the links from the help pages and tin this case  
it appears that you did not follow the one to


I checked out the definition of boxplot according to Wikipedia ( 
\), but it also had several approaches

listed for how the whiskers could be determined, so I'm just  
curious how the default

boxplot {graphics} does it.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread David Winsemius

On May 11, 2010, at 10:35 PM, Jason Rupert wrote:

HummMaybe I need to look some place else than boxplot.stats  
{grDevices} for a definition of how the upper/lower whiskers are  

By any chance are they "the lowest datum still within 1.5 IQR of the  
lower quartile, and the highest datum still within 1.5 IQR of the  
upper quartile"?

None of the links from boxplot.stats {grDevices} seemed to reveal  
the secret definition of the R whiskers.

You didn't need to go to any other pages. You just needed to read  
boxplot.stats ... apparently more than once.


Thanks again.

- Original Message 
From: Jason Rupert 
To: David Winsemius 
Cc: R Project Help 
Sent: Tue, May 11, 2010 9:26:25 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}


From ?boxplot.stats:


The two ‘hinges’ are versions of the first and third quartile, i.e.,  
close to quantile(x, c(1,3)/4). The hinges equal the quartiles for  
odd n (where n <- length(x)) and differ for even n. Whereas the  
quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the  
hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in  
the middle of two observations otherwise.

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems  
to be based on the same calculations as the formula with 1.57 in  
Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16).  
They are based on asymptotic normality of the median and roughly  
equal sample sizes for the two medians being compared, and are said  
to be rather insensitive to the underlying distributions of the  
samples. The idea appears to be to give roughly a 95% confidence  
interval for the difference in two medians.

Is a notch equal to the upper/lower whisker?   Is this just a  
difference of terminology or something?

Thanks again for all the insights.

- Original Message 
From: David Winsemius 
To: Jason Rupert 
Cc: R Project Help 
Sent: Tue, May 11, 2010 9:00:15 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

How are the lower/upper whiskers defined in the default version of  
boxplot {graphics}?

I tried help(boxplot) and searching, but I was unable  
to determine an absolute answer.

You need to follow the links from the help pages and tin this case  
it appears that you did not follow the one to


I checked out the definition of boxplot according to Wikipedia ( 
), but it also had several approaches
listed for how the whiskers could be determined, so I'm just  
curious how the default

boxplot {graphics} does it.

Thanks for any feedback

Follow links with the R help system.

and insights.

David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert
Okay...Let me see if I've got it...

I'm just trying to use the default boxplot {graphics} capability in R...

So I call something like the following:
> boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of 
> Cylinders", ylab="Miles Per Gallon") \

That produces something as shown in the following:

When that default boxplot is called, i.e. boxplot {graphics}, as shown in the 
line of code above, it is actually calling into boxplot.stats {grDevices}.  
When boxplot.stats {grDevices} is called it has a default value for "coef" of 
1.5, i.e. coef = 1.5.  

If I understand the purpose of "coef" correctly, it means that the 
‘whiskers’ should extend out 1.5 times the length of the box away from the 
box.   Is that correct?  

Now I look back at the plot, and I'm not sure how 1.5 times the length of the 
box corresponds with the whisker lengths shown in the image:

Is it that the whisker length is a total of 1.5 the length of the box and 
centered about the median (2nd Quartile)?  

Just trying to get a handle on this, so thanks again for all the help in 
deciphering this. 

From: RJ Cunningham>
Cc: R Project Help 
Sent: Tue, May 11, 2010 9:57:48 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

I think not. Isn't the "secret" here?


x: a numeric vector for which the boxplot will be constructed 
('NA's and 'NaN's are allowed and omitted). 

coef: this determines how far the plot 'whiskers' extend out 
from the box.  If 'coef' is positive, the whiskers extend 
to the most extreme data point which is no more than 
'coef' times the length of the box away from the box. A 
value of zero causes the whiskers to extend to the data 
extremes (and no outliers be returned). 

do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' 
component respectively will be empty in the result. 


The two 'hinges' are versions of the first and third quartile,... 

On Wed May 12 10:35 , Jason Rupert  sent:

HummMaybe I need to look some place else than boxplot.stats {grDevices} for 
a definition of how the upper/lower whiskers are produced. 
>By any chance are they "the lowest datum still within 1.5 IQR of the lower 
>quartile, and the highest datum still within 1.5 IQR of the upper quartile"?
>None of the links from boxplot.stats {grDevices} seemed to reveal the secret 
>definition of the R whiskers. 
>Thanks again.
>- Original Message 

>To: David Winsemius 
>Cc: R Project Help 
>Sent: Tue, May 11, 2010 9:26:25 PM
>Subject: Re: [R] Whiskers on the default boxplot {graphics}
>From ?boxplot.stats:
>The two ‘hinges’ are versions of the first and third quartile, i.e., close 
>to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- 
>length(x)) and differ for even n. Whereas the quartiles only equal 
>observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for 
>n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise.
>The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be 
>based on the same calculations as the formula with 1.57 in Chambers et al. 
>(1983, p. 62), given in McGill et al. (1978, p. 16). They are based on 
>asymptotic normality of the median and roughly equal sample sizes for the two 
>medians being compared, and are said to be rather insensitive to the 
>underlying distributions of the samples. The idea appears to be to give 
>roughly a 95% confidence interval for the difference in two medians.
>Is a notch equal to the upper/lower whisker?   Is this just a difference of 
>terminology or something? 
>Thanks again for all the insights. 
>- Original Message 
>From: David Winsemius 

>Cc: R Project Help 
>Sent: Tue, May 11, 2010 9:00:15 PM
>Subject: Re: [R] Whiskers on the default boxplot {graphics}
>On May 11, 2010, at 9:45 PM, Jason Rupert wrote:
>> How are the lower/upper whiskers defined in the default version of boxplot 
>> {graphics}?
>> I tried help(boxplot) and searching, but I was unable to 
>> determine an absolute answer.
>You need to f

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread RJ Cunningham

   I think not. Isn't the "secret" here?
   x: a numeric vector for which the boxplot will be constructed
   ('NA's and 'NaN's are allowed and omitted).
   coef: this determines how far the plot 'whiskers' extend out
   from the box. If 'coef' is positive, the whiskers extend
   to the most extreme data point which is no more than
   'coef' times the length of the box away from the box. A
   value of zero causes the whiskers to extend to the data
   extremes (and no outliers be returned).
   do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
   component respectively will be empty in the result.
   The two 'hinges' are versions of the first and third quartile,...
   On Wed May 12 10:35 , Jason Rupert sent:

 HummMaybe  I  need  to  look some place else than boxplot.stats
 {grDevices} for a definition of how the upper/lower whiskers are produced.
 By any chance are they "the lowest datum still within 1.5 IQR of the lower
 quartile,  and  the highest datum still within 1.5 IQR of the upper
 None of the links from boxplot.stats {grDevices} seemed to reveal the
 secret definition of the R whiskers.
 Thanks again.
 - Original Message 
 From: Jason Rupert <[1]>
 To: David Winsemius <[2]>
 Cc: R Project Help <[3]>
 Sent: Tue, May 11, 2010 9:26:25 PM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 >From ?boxplot.stats:
 The two âhingesâ are versions of the first and third quartile, i.e., close
 to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where
 n <- length(x)) and differ for even n. Whereas the quartiles only equal
 observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally
 for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations
 The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be
 based on the same calculations as the formula with 1.57 in Chambers et al.
 (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on
 asymptotic normality of the median and roughly equal sample sizes for the
 two medians being compared, and are said to be rather insensitive to the
 underlying distributions of the samples. The idea appears to be to give
 roughly a 95% confidence interval for the difference in two medians.
 Is a notch equal to the upper/lower whisker? Is this just a difference of
 terminology or something?
 Thanks again for all the insights.
 - Original Message 
 From: David Winsemius <[4]>
 To: Jason Rupert <[5]>
     Cc: R Project Help <[6]>
 Sent: Tue, May 11, 2010 9:00:15 PM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 On May 11, 2010, at 9:45 PM, Jason Rupert wrote:
 > How are the lower/upper whiskers defined in the default version of
 boxplot {graphics}?
 > I tried help(boxplot) and searching [7], but I was unable
 to determine an absolute answer.
 You need to follow the links from the help pages and tin this case it
 appears that you did not follow the one to
 >  I  checked  out the definition of boxplot according to Wikipedia
 ([8]\), but it also had several
 > listed for how the whiskers could be determined, so I'm just curious how
 the default
 > boxplot {graphics} does it.
 > Thanks for any feedback
 Follow links with the R help system.
 > and insights.
 David Winsemius, MD
 West Hartford, CT
 [9] mailing list
 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.
 [12] mailing list
 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.



Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert
HummMaybe I need to look some place else than boxplot.stats {grDevices} for 
a definition of how the upper/lower whiskers are produced.  

By any chance are they "the lowest datum still within 1.5 IQR of the lower 
quartile, and the highest datum still within 1.5 IQR of the upper quartile"?

None of the links from boxplot.stats {grDevices} seemed to reveal the secret 
definition of the R whiskers.  

Thanks again.

- Original Message 
From: Jason Rupert 
To: David Winsemius 
Cc: R Project Help 
Sent: Tue, May 11, 2010 9:26:25 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}


From ?boxplot.stats:


The two ‘hinges’ are versions of the first and third quartile, i.e., close to 
quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- 
length(x)) and differ for even n. Whereas the quartiles only equal observations 
for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n 
= 2 mod 4), and are in the middle of two observations otherwise.

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be 
based on the same calculations as the formula with 1.57 in Chambers et al. 
(1983, p. 62), given in McGill et al. (1978, p. 16). They are based on 
asymptotic normality of the median and roughly equal sample sizes for the two 
medians being compared, and are said to be rather insensitive to the underlying 
distributions of the samples. The idea appears to be to give roughly a 95% 
confidence interval for the difference in two medians.

Is a notch equal to the upper/lower whisker?   Is this just a difference of 
terminology or something? 

Thanks again for all the insights. 

- Original Message 
From: David Winsemius 
To: Jason Rupert 
Cc: R Project Help 
Sent: Tue, May 11, 2010 9:00:15 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

> How are the lower/upper whiskers defined in the default version of boxplot 
> {graphics}?
> I tried help(boxplot) and searching, but I was unable to 
> determine an absolute answer.

You need to follow the links from the help pages and tin this case it appears 
that you did not follow the one to


> I checked out the definition of boxplot according to Wikipedia 
> (, but it also had several approaches
> listed for how the whiskers could be determined, so I'm just curious how the 
> default
> boxplot {graphics} does it.
> Thanks for any feedback

Follow links with the R help system.

> and insights.

David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert

From ?boxplot.stats:


The two ‘hinges’ are versions of the first and third quartile, i.e., close to 
quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- 
length(x)) and differ for even n. Whereas the quartiles only equal observations 
for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n 
= 2 mod 4), and are in the middle of two observations otherwise.

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be 
based on the same calculations as the formula with 1.57 in Chambers et al. 
(1983, p. 62), given in McGill et al. (1978, p. 16). They are based on 
asymptotic normality of the median and roughly equal sample sizes for the two 
medians being compared, and are said to be rather insensitive to the underlying 
distributions of the samples. The idea appears to be to give roughly a 95% 
confidence interval for the difference in two medians.

Is a notch equal to the upper/lower whisker?   Is this just a difference of 
terminology or something? 

Thanks again for all the insights. 

- Original Message 
From: David Winsemius 
To: Jason Rupert 
Cc: R Project Help 
Sent: Tue, May 11, 2010 9:00:15 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

> How are the lower/upper whiskers defined in the default version of boxplot 
> {graphics}?
> I tried help(boxplot) and searching, but I was unable to 
> determine an absolute answer.

You need to follow the links from the help pages and tin this case it appears 
that you did not follow the one to


> I checked out the definition of boxplot according to Wikipedia 
> (, but it also had several approaches
> listed for how the whiskers could be determined, so I'm just curious how the 
> default
> boxplot {graphics} does it.
> Thanks for any feedback

Follow links with the R help system.

> and insights.

David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread David Winsemius

On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

How are the lower/upper whiskers defined in the default version of  
boxplot {graphics}?

I tried help(boxplot) and searching, but I was unable  
to determine an absolute answer.

You need to follow the links from the help pages and tin this case it  
appears that you did not follow the one to


I checked out the definition of boxplot according to Wikipedia ( 
), but it also had several approaches
listed for how the whiskers could be determined, so I'm just curious  
how the default

boxplot {graphics} does it.

Thanks for any feedback

Follow links with the R help system.

and insights.

David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert
How are the lower/upper whiskers defined in the default version of boxplot 

I tried help(boxplot) and searching, but I was unable to 
determine an absolute answer.  

I checked out the definition of boxplot according to Wikipedia 
(, but it also had several approaches 
listed for how the whiskers could be determined, so I'm just curious how the 
boxplot {graphics} does it. 

Thanks for any feedback and insights.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.