Re: [R] Whiskers on the default boxplot {graphics}

2010-05-14 Thread Peter Ehlers

Just to put this topic to rest:

The hinges match quantile(x, probs = c(1,3)/4, type = 2) except when n
= 3 mod 4.

I no longer have Tukey's EDA book, but I think that his idea was that
hinges (aka quartiles) were defined as medians of the lower/upper
halves of the (sorted, of course) data, where a 'half' would include
the median for odd sample sizes. And that's how they are calculated
in fivenum().

Thus hinges are a 10th definition of quartiles, but they don't lend
themselves to generalization to arbitrary quantiles other than, say,
octiles or other (1/2^k)-iles.

 -Peter

On 2010-05-13 11:47, David Winsemius wrote:

I agree. I was convinced by Ehlers' example that type =2 was a better
match to fivenum's result



--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Robert Baer

Hi Peter,

You're absolutely correct!  The description for 'range' in 'boxplot' help 
file is a little bit confusing by using the words interquartile range. 
I think it should be changed to the length of the box to be exact and 
consistent with those in the help file for boxplot.stats.


The issue is probably that there are multiple ways (9 to be exact) of 
defining quantiles in R.  See 'type= ' arguement for ?quantile.  The 
quantile function uses type=7 by default which matches the quantile 
definition used by S-Plus(?), but differs from that used by SPSS.  Doesn't 
fivenum essentially use the equivalent of a different type=  arguement 
(maybe 2 or 5) in constructing the hinges?


It seems perfectly reasonable to talk about 'length of box' (or 'box height' 
depending how you display the boxplot), but aren't the hinges simply Q1 and 
Q3 defined by one of the possible quartile definitions (as Peter points out 
the one used by fivenum)?  The box height does not necesarily match the 
distance produced by IQR() which also seems to use the equivalent of 
quantile(..., type=7), but it is still an IQR, is it not?


Quantiles apparantly can be defined in more than one acceptable way (sort 
of like dealing with ties in rank statistics).  The OP seemed to want an 
exact explanation of the wiskers, and I think Peter has pointed us at the 
definition of quartiles used by fivenum, as opposed to the default  used 
with quantile(..., type=7).


All that said, I'm not convinced that it is wrong to speak of interquartile 
range in 'boxplot' help.


Rob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread David Winsemius


On May 13, 2010, at 10:25 AM, Robert Baer wrote:


Hi Peter,

You're absolutely correct!  The description for 'range' in  
'boxplot' help file is a little bit confusing by using the words  
interquartile range. I think it should be changed to the length  
of the box to be exact and consistent with those in the help file  
for boxplot.stats.


The issue is probably that there are multiple ways (9 to be exact)  
of defining quantiles in R.  See 'type= ' arguement for ?quantile.   
The quantile function uses type=7 by default which matches the  
quantile definition used by S-Plus(?), but differs from that used by  
SPSS.  Doesn't fivenum essentially use the equivalent of a different  
type=  arguement (maybe 2 or 5) in constructing the hinges?


It seems perfectly reasonable to talk about 'length of box' (or 'box  
height' depending how you display the boxplot), but aren't the  
hinges simply Q1 and Q3 defined by one of the possible quartile  
definitions (as Peter points out the one used by fivenum)?  The box  
height does not necesarily match the distance produced by IQR()  
which also seems to use the equivalent of quantile(..., type=7), but  
it is still an IQR, is it not?


Quantiles apparantly can be defined in more than one acceptable  
way (sort of like dealing with ties in rank statistics).  The OP  
seemed to want an exact explanation of the wiskers, and I think  
Peter has pointed us at the definition of quartiles used by fivenum,  
as opposed to the default  used with quantile(..., type=7).


Yes, and experimentation leads me to the conclusion that the only  
possible candidate for matching up the results of fivenum[c(2,4]  with  
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove  
that to myself from mathematical arguments. since I do not quite  
understand the formalism in the quantile page. If the match is not  
exact, this would be a tenth definition of IQR.


 set.seed(123)
  y - rexp(300, .02)
 fivenum(y)
[1]   0.2183685  15.8740466  42.1147820  74.0362517 360.5503788
 for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
 25%  75%
15.82506 73.93080
 25%  75%
15.87405 74.03625
 25%  75%
15.84955 74.08898
 25%  75%
15.89854 73.98352
 25%  75%
15.86588 74.05383
 25%  75%
15.86792 74.04943

--
David.



All that said, I'm not convinced that it is wrong to speak of  
interquartile range in 'boxplot' help.


Rob


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Peter Ehlers

David,

try this:

fivenum(1:101)
quantile(1:101, c(1,3)/4, type=5)

 -Peter

On 2010-05-13 8:55, David Winsemius wrote:


On May 13, 2010, at 10:25 AM, Robert Baer wrote:


Hi Peter,

You're absolutely correct! The description for 'range' in 'boxplot'
help file is a little bit confusing by using the words interquartile
range. I think it should be changed to the length of the box to be
exact and consistent with those in the help file for boxplot.stats.


The issue is probably that there are multiple ways (9 to be exact) of
defining quantiles in R. See 'type= ' arguement for ?quantile. The
quantile function uses type=7 by default which matches the quantile
definition used by S-Plus(?), but differs from that used by SPSS.
Doesn't fivenum essentially use the equivalent of a different type= 
arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or 'box
height' depending how you display the boxplot), but aren't the hinges
simply Q1 and Q3 defined by one of the possible quartile definitions
(as Peter points out the one used by fivenum)? The box height does not
necesarily match the distance produced by IQR() which also seems to
use the equivalent of quantile(..., type=7), but it is still an IQR,
is it not?

Quantiles apparantly can be defined in more than one acceptable way
(sort of like dealing with ties in rank statistics). The OP seemed to
want an exact explanation of the wiskers, and I think Peter has
pointed us at the definition of quartiles used by fivenum, as opposed
to the default used with quantile(..., type=7).


Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4] with
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that
to myself from mathematical arguments. since I do not quite understand
the formalism in the quantile page. If the match is not exact, this
would be a tenth definition of IQR.

  set.seed(123)
  y - rexp(300, .02)
  fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
  for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943



--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Robert Baer
And try this (which seems to leave us with type=2) and is listed in 
?quantile as Discontinuous sample quantile types 1, 2, and 3

quantile(1:101, c(1,3)/4, type=2)

25% 75%
26  76


David,

try this:

fivenum(1:101)
quantile(1:101, c(1,3)/4, type=5)

 -Peter

On 2010-05-13 8:55, David Winsemius wrote:


On May 13, 2010, at 10:25 AM, Robert Baer wrote:


Hi Peter,

You're absolutely correct! The description for 'range' in 'boxplot'
help file is a little bit confusing by using the words interquartile
range. I think it should be changed to the length of the box to be
exact and consistent with those in the help file for boxplot.stats.


The issue is probably that there are multiple ways (9 to be exact) of
defining quantiles in R. See 'type= ' arguement for ?quantile. The
quantile function uses type=7 by default which matches the quantile
definition used by S-Plus(?), but differs from that used by SPSS.
Doesn't fivenum essentially use the equivalent of a different type= 
arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or 'box
height' depending how you display the boxplot), but aren't the hinges
simply Q1 and Q3 defined by one of the possible quartile definitions
(as Peter points out the one used by fivenum)? The box height does not
necesarily match the distance produced by IQR() which also seems to
use the equivalent of quantile(..., type=7), but it is still an IQR,
is it not?

Quantiles apparantly can be defined in more than one acceptable way
(sort of like dealing with ties in rank statistics). The OP seemed to
want an exact explanation of the wiskers, and I think Peter has
pointed us at the definition of quartiles used by fivenum, as opposed
to the default used with quantile(..., type=7).


Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4] with
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that
to myself from mathematical arguments. since I do not quite understand
the formalism in the quantile page. If the match is not exact, this
would be a tenth definition of IQR.

  set.seed(123)
  y - rexp(300, .02)
  fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
  for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943



--
Peter Ehlers
University of Calgary



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread David Winsemius


On May 13, 2010, at 12:18 PM, Robert Baer wrote:

And try this (which seems to leave us with type=2) and is listed in ? 
quantile as Discontinuous sample quantile types 1, 2, and 3

quantile(1:101, c(1,3)/4, type=2)

25% 75%
26  76


I think Peter may be right,. If I do it with the rnorm function I  
repeatedly get the same result for fivenum[2] and the type 2 first  
quartile. I did not test those types because they were designed for  
discrete values variables, but I suppose everything is really discrete  
on computers, eh?


 fivenum(x - rnorm(101) )
[1] -2.6224338 -0.9682586 -0.1897377  0.5999332  2.5409711
 quantile(x, c(1,3)/4, type=2)
   25%75%
-0.9682586  0.5999332

 fivenum(x - rnorm(101) )
[1] -3.8251928 -0.6495966  0.1816233  0.7101774  2.3789054
 quantile(x, c(1,3)/4, type=2)
   25%75%
-0.6495966  0.7101774

--
David.



David,

try this:

fivenum(1:101)
quantile(1:101, c(1,3)/4, type=5)

-Peter

On 2010-05-13 8:55, David Winsemius wrote:


On May 13, 2010, at 10:25 AM, Robert Baer wrote:


Hi Peter,

You're absolutely correct! The description for 'range' in  
'boxplot'
help file is a little bit confusing by using the words  
interquartile
range. I think it should be changed to the length of the box  
to be
exact and consistent with those in the help file for  
boxplot.stats.


The issue is probably that there are multiple ways (9 to be  
exact) of

defining quantiles in R. See 'type= ' arguement for ?quantile. The
quantile function uses type=7 by default which matches the quantile
definition used by S-Plus(?), but differs from that used by SPSS.
Doesn't fivenum essentially use the equivalent of a different  
type= 

arguement (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable to talk about 'length of box' (or  
'box
height' depending how you display the boxplot), but aren't the  
hinges
simply Q1 and Q3 defined by one of the possible quartile  
definitions
(as Peter points out the one used by fivenum)? The box height  
does not

necesarily match the distance produced by IQR() which also seems to
use the equivalent of quantile(..., type=7), but it is still an  
IQR,

is it not?

Quantiles apparantly can be defined in more than one acceptable  
way
(sort of like dealing with ties in rank statistics). The OP  
seemed to

want an exact explanation of the wiskers, and I think Peter has
pointed us at the definition of quartiles used by fivenum, as  
opposed

to the default used with quantile(..., type=7).


Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4]  
with
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove  
that
to myself from mathematical arguments. since I do not quite  
understand

the formalism in the quantile page. If the match is not exact, this
would be a tenth definition of IQR.

 set.seed(123)
 y - rexp(300, .02)
 fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
 for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943



--
Peter Ehlers
University of Calgary




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Joshua Wiley
On Thu, May 13, 2010 at 7:55 AM, David Winsemius dwinsem...@comcast.net wrote:
 Yes, and experimentation leads me to the conclusion that the only possible
 candidate for matching up the results of fivenum[c(2,4]  with quantile(y,
 c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from
 mathematical arguments. since I do not quite understand the formalism in the
 quantile page. If the match is not exact, this would be a tenth definition
 of IQR.

David,

Here is some sample data, and the most parsimonious code I could come
up with for how quantile() computes the quartiles when using type=5.
The code for fivenum() seems simple enough, but I am not quite able to
make enough sense of the code for type=5 from quantile() to say
confidently why they are different.

I am open to the possibility that my attempts to extract relevant code
from quantile were flawed, but my tentative conclusion is that
quantile(x, type=5) != fivenum(x).

##
x - c(0.643796386452606, -0.605277531056206, -0.339239367816402,
1.12408365699422, 0.615753476531243, -1.10545696568758,
0.666533406841698, 1.42794492209271, 0.624752921945051,
2.02317205214712, -0.365586657432646, 0.821742701084307,
-0.874753498321076, -0.0298783402061118, 1.18037670706428,
-0.178274986836195, 0.308703365439049, 0.619700844646392,
0.54977981430092, -1.82161514610448, -1.28413556650749,
-0.0443852992196351, 0.704196760556652, -1.88596816676741,
-0.420811351737096)
oldx - x #this is just a backup because x will be transformed

##Start from quantile()
probs - c(0, 0.25, 0.5, 0.75, 1)
type - 5
n - length(x)
switch(type - 3, {
  a - 0
  b - 1
}, a - b - 0.5, a - b - 0, a - b - 1, a - b - 1/3, a - b - 3/8)
fuzz - 4 * .Machine$double.eps
nppm - a + probs * (n + 1 - a - b)
j - floor(nppm + fuzz)
h - nppm - j
h - ifelse(abs(h)  fuzz, 0, h)
x - sort(x, partial = unique(c(1, j[j  0L  j = n], (j + 1)[j  0L
 j  n], n)))
x - c(x[1L], x[1L], x, x[n], x[n])
qs - x[j + 2L]
qs[h == 1] - x[j + 3L][h == 1]
other - (h  0)  (h  1)
if (any(other)) qs[other] - ((1 - h) * x[j + 2L] + h * x[j + 3L])[other]
##End from quantile

qs # from the calculations above
quantile(oldx, type=5) #this should match qs
fivenum(oldx) #the 25% does not match


everything else snipped

Josh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread David Winsemius
I agree. I was convinced by Ehlers' example that type =2 was a better  
match to fivenum's result


--
David..


On May 13, 2010, at 1:36 PM, Joshua Wiley wrote:

On Thu, May 13, 2010 at 7:55 AM, David Winsemius dwinsem...@comcast.net 
 wrote:
Yes, and experimentation leads me to the conclusion that the only  
possible
candidate for matching up the results of fivenum[c(2,4]  with  
quantile(y,
c(1,3)/4, type=i) is for type=5. I'm not able to prove that to  
myself from
mathematical arguments. since I do not quite understand the  
formalism in the
quantile page. If the match is not exact, this would be a tenth  
definition

of IQR.


David,

Here is some sample data, and the most parsimonious code I could come
up with for how quantile() computes the quartiles when using type=5.
The code for fivenum() seems simple enough, but I am not quite able to
make enough sense of the code for type=5 from quantile() to say
confidently why they are different.

I am open to the possibility that my attempts to extract relevant code
from quantile were flawed, but my tentative conclusion is that
quantile(x, type=5) != fivenum(x).

##
x - c(0.643796386452606, -0.605277531056206, -0.339239367816402,
1.12408365699422, 0.615753476531243, -1.10545696568758,
0.666533406841698, 1.42794492209271, 0.624752921945051,
2.02317205214712, -0.365586657432646, 0.821742701084307,
-0.874753498321076, -0.0298783402061118, 1.18037670706428,
-0.178274986836195, 0.308703365439049, 0.619700844646392,
0.54977981430092, -1.82161514610448, -1.28413556650749,
-0.0443852992196351, 0.704196760556652, -1.88596816676741,
-0.420811351737096)
oldx - x #this is just a backup because x will be transformed

##Start from quantile()
probs - c(0, 0.25, 0.5, 0.75, 1)
type - 5
n - length(x)
switch(type - 3, {
 a - 0
 b - 1
}, a - b - 0.5, a - b - 0, a - b - 1, a - b - 1/3, a - b -  
3/8)

fuzz - 4 * .Machine$double.eps
nppm - a + probs * (n + 1 - a - b)
j - floor(nppm + fuzz)
h - nppm - j
h - ifelse(abs(h)  fuzz, 0, h)
x - sort(x, partial = unique(c(1, j[j  0L  j = n], (j + 1)[j  0L
 j  n], n)))
x - c(x[1L], x[1L], x, x[n], x[n])
qs - x[j + 2L]
qs[h == 1] - x[j + 3L][h == 1]
other - (h  0)  (h  1)
if (any(other)) qs[other] - ((1 - h) * x[j + 2L] + h * x[j + 3L]) 
[other]

##End from quantile

qs # from the calculations above
quantile(oldx, type=5) #this should match qs
fivenum(oldx) #the 25% does not match


everything else snipped

Josh


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-13 Thread Shi, Tao
Hi Robert,

Your points are well taken.  However, I reserve mine, b/c I think without this 
detailed discussion, an average R user would simply confused the interquartile 
range said in boxplot help file with the results of IQR.  Changing it to 
length of box makes it more exact and consistent, as I stated earlier.  With 
all these being said, this is up to the R core team to decide.

...Tao





- Original Message 
 From: Robert Baer rb...@atsu.edu
 To: Shi, Tao shida...@yahoo.com; Peter Ehlers ehl...@ucalgary.ca
 Cc: R Project Help R-help@r-project.org
 Sent: Thu, May 13, 2010 7:25:09 AM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 
  Hi Peter,
 
 You're absolutely correct!  The description 
 for 'range' in 'boxplot' help file is a little bit confusing by using the 
 words 
 interquartile range. I think it should be changed to the length of the 
 box 
 to be exact and consistent with those in the help file for 
 boxplot.stats.

The issue is probably that there are multiple ways (9 to 
 be exact) of defining quantiles in R.  See 'type= ' arguement for 
 ?quantile.  The quantile function uses type=7 by default which matches the 
 quantile definition used by S-Plus(?), but differs from that used by SPSS.  
 Doesn't fivenum essentially use the equivalent of a different type=  
 arguement 
 (maybe 2 or 5) in constructing the hinges?

It seems perfectly reasonable 
 to talk about 'length of box' (or 'box height' depending how you display the 
 boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the 
 possible 
 quartile definitions (as Peter points out the one used by fivenum)?  The 
 box height does not necesarily match the distance produced by IQR() which 
 also 
 seems to use the equivalent of quantile(..., type=7), but it is still an IQR, 
 is 
 it not?

Quantiles apparantly can be defined in more than one acceptable 
 way (sort of like dealing with ties in rank statistics).  The OP seemed to 
 want an exact explanation of the wiskers, and I think Peter has pointed us 
 at 
 the definition of quartiles used by fivenum, as opposed to the default  
 used with quantile(..., type=7).

All that said, I'm not convinced that 
 it is wrong to speak of interquartile range in 'boxplot' 
 help.

Rob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Dennis Murphy
Hi:

Let's do some math :)

On Tue, May 11, 2010 at 8:55 PM, Jason Rupert jasonkrup...@yahoo.comwrote:

 Okay...Let me see if I've got it...

 I'm just trying to use the default boxplot {graphics} capability in R...

 So I call something like the following:
  boxplot(mpg~cyl,data=mtcars, main=Car Milage Data, xlab=Number of
 Cylinders, ylab=Miles Per Gallon) \

 That produces something as shown in the following:
 http://www.statmethods.net/graphs/images/boxplot1.jpg

 When that default boxplot is called, i.e. boxplot {graphics}, as shown in
 the line of code above, it is actually calling into boxplot.stats
 {grDevices}.  When boxplot.stats {grDevices} is called it has a default
 value for coef of 1.5, i.e. coef = 1.5.

 If I understand the purpose of coef correctly, it means that the
 ‘whiskers’ should extend out 1.5 times the length of the box away from the
 box.   Is that correct?


If by 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1
where Q refers to quartile), then assuming that
x is the numeric vector of interest for a boxplot,

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

So the upper whisker is located at the *smaller* of the maximum x value and
Q_3 + 1.5 IQR,
whereas the lower whisker is located at the *larger* of the smallest x value
and Q_1 - 1.5 IQR.

In your terms, the whiskers should extend out a *maximum* of 1.5 times the
length of the box
away from the box.

Visually, this means that individual points more extreme in value than Q3 +
1.5 IQR are plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted
separately on the low
end. Depending on the source, the separately plotted points are called
'outside values'. On
the other hand, if the maximum or minimum values of x are closer than 1.5
IQR in distance from
its nearest quartile, then that is where the whisker is positioned.

Does that make sense?

HTH,
Dennis


 Now I look back at the plot, and I'm not sure how 1.5 times the length of
 the box corresponds with the whisker lengths shown in the image:
 http://www.statmethods.net/graphs/images/boxplot1.jpg

 Is it that the whisker length is a total of 1.5 the length of the box and
 centered about the median (2nd Quartile)?

 Just trying to get a handle on this, so thanks again for all the help in
 deciphering this.







 
 From: RJ Cunningham ro...@iinet.net.au

 ast.net
 Cc: R Project Help R-help@r-project.org
 Sent: Tue, May 11, 2010 9:57:48 PM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}

 I think not. Isn't the secret here?


 Arguments:

 x: a numeric vector for which the boxplot will be constructed
 ('NA's and 'NaN's are allowed and omitted).

 coef: this determines how far the plot 'whiskers' extend out
 from the box.  If 'coef' is positive, the whiskers extend
 to the most extreme data point which is no more than
 'coef' times the length of the box away from the box. A
 value of zero causes the whiskers to extend to the data
 extremes (and no outliers be returned).

 do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
 component respectively will be empty in the result.

 Details:

 The two 'hinges' are versions of the first and third quartile,...


 On Wed May 12 10:35 , Jason Rupert  sent:


 HummMaybe I need to look some place else than boxplot.stats {grDevices}
 for a definition of how the upper/lower whiskers are produced.
 
 
 By any chance are they the lowest datum still within 1.5 IQR of the lower
 quartile, and the highest datum still within 1.5 IQR of the upper quartile?
 
 
 None of the links from boxplot.stats {grDevices} seemed to reveal the
 secret definition of the R whiskers.
 
 
 Thanks again.
 
 
 
 
 
 
 - Original Message 
 

 
 To: David Winsemius dwinsem...@comcast.net
 
 Cc: R Project Help R-help@r-project.org
 
 Sent: Tue, May 11, 2010 9:26:25 PM
 
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 
 
 Wowzers...
 
 
 From ?boxplot.stats:
 
 
 Details
 
 
 The two ‘hinges’ are versions of the first and third quartile, i.e., close
 to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n
 - length(x)) and differ for even n. Whereas the quartiles only equal
 observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally
 for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations
 otherwise.
 
 
 The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be
 based on the same calculations as the formula with 1.57 in Chambers et al.
 (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on
 asymptotic normality of the median and roughly equal sample sizes for the
 two medians being compared, and are said to be rather insensitive to the
 underlying distributions of the samples. The idea appears to be to give
 roughly a 95% confidence interval for the difference in two medians.
 
 
 
 
 Is a notch equal to the upper/lower whisker

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Jason Rupert
Fantastic! 

It would be great if the description could be modified to include the 
mysterious bit about the upper and lower bound whisker positions:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

Maybe that is clearly written in the description of boxplot.stats {grDevices}, 
but evidently I missed it numerous times and also did not pick up on this 
intent from the original description of boxplot {graphics}.  

Your type of descriptive answer and helpfulness is much appreciated and one of 
the reasons I continue to endorse the R tool over numerous others.   

More like you and the tool may be headed for domination in the market. 

Thanks again!







From: Dennis Murphy djmu...@gmail.com

Cc: R Project Help R-help@r-project.org
Sent: Wed, May 12, 2010 2:50:19 AM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

Hi:

Let's do some math :)



e:

Okay...Let me see if I've got it...

I'm just trying to use the default boxplot {graphics} capability in R...

So I call something like the following:
 boxplot(mpg~cyl,data=mtcars, main=Car Milage Data, xlab=Number of 
 Cylinders, ylab=Miles Per Gallon) \

That produces something as shown in the following:
http://www.statmethods.net/graphs/images/boxplot1.jpg

When that default boxplot is called, i.e. boxplot {graphics}, as shown in the 
line of code above, it is actually calling into boxplot.stats {grDevices}.  
When boxplot.stats {grDevices} is called it has a default value for coef of 
1.5, i.e. coef = 1.5.

If I understand the purpose of coef correctly, it means that the 
‘whiskers’ should extend out 1.5 times the length of the box away from 
the box.   Is that correct?


If by 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 
where Q refers to quartile), then assuming that
x is the numeric vector of interest for a boxplot,

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

So the upper whisker is located at the *smaller* of the maximum x value and Q_3 
+ 1.5 IQR,
whereas the lower whisker is located at the *larger* of the smallest x value 
and Q_1 - 1.5 IQR.

In your terms, the whiskers should extend out a *maximum* of 1.5 times the 
length of the box
away from the box. 

Visually, this means that individual points more extreme in value than Q3 + 1.5 
IQR are plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted separately 
on the low
end. Depending on the source, the separately plotted points are called 'outside 
values'. On
the other hand, if the maximum or minimum values of x are closer than 1.5 IQR 
in distance from
its nearest quartile, then that is where the whisker is positioned.

Does that make sense?

HTH,
Dennis


Now I look back at the plot, and I'm not sure how 1.5 times the length of the 
box corresponds with the whisker lengths shown in the image:
http://www.statmethods.net/graphs/images/boxplot1.jpg

Is it that the whisker length is a total of 1.5 the length of the box and 
centered about the median (2nd Quartile)?

Just trying to get a handle on this, so thanks again for all the help in 
deciphering this.








From: RJ Cunningham ro...@iinet.net.au

ast.net
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:57:48 PM

Subject: Re: [R] Whiskers on the default boxplot {graphics}


I think not. Isn't the secret here?


Arguments:

x: a numeric vector for which the boxplot will be constructed
('NA's and 'NaN's are allowed and omitted).

coef: this determines how far the plot 'whiskers' extend out
from the box.  If 'coef' is positive, the whiskers extend
to the most extreme data point which is no more than
'coef' times the length of the box away from the box. A
value of zero causes the whiskers to extend to the data
extremes (and no outliers be returned).

do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
component respectively will be empty in the result.

Details:

The two 'hinges' are versions of the first and third quartile,...


On Wed May 12 10:35 , Jason Rupert  sent:


HummMaybe I need to look some place else than boxplot.stats {grDevices} 
for a definition of how the upper/lower whiskers are produced.


By any chance are they the lowest datum still within 1.5 IQR of the lower 
quartile, and the highest datum still within 1.5 IQR of the upper quartile?


None of the links from boxplot.stats {grDevices} seemed to reveal the secret 
definition of the R whiskers.


Thanks again.






- Original Message 




To: David Winsemius dwinsem...@comcast.net

Cc: R Project Help R-help@r-project.org

Sent: Tue, May 11, 2010 9:26:25 PM

Subject: Re: [R] Whiskers on the default boxplot {graphics}


Wowzers...


From ?boxplot.stats:


Details


The two ‘hinges’ are versions of the first and third quartile, i.e., 
close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n 
(where n - length(x

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Robert Baer


- Original Message - 
Fantastic!


It would be great if the description could be modified to include the 
mysterious bit about the upper and lower bound whisker positions:


upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

-- snip --
--
NOT quite!

The boxplot.stats help reads under the coef argument:
... the whiskers extend to the most extreme data point which is no more 
than coef times the length of the box away from the box.



If there are outliers, and the most extreme data point within 1.5 *IQR of Q1 
or Q3 is less than 1.5 IQRs, and the wisker may end earlier than 1.5*IQR, 
but the data point at which it ends may NOT be max(x) or min(x).


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Dennis Murphy
Hi:

Point well taken, Robert. This is a good example of the difference between
how
something is defined mathematically as opposed to how it is applied
computationally.
Thank you for the clarification.

Regards,
Dennis

On Wed, May 12, 2010 at 9:51 AM, Robert Baer rb...@atsu.edu wrote:


 - Original Message - Fantastic!


 It would be great if the description could be modified to include the
 mysterious bit about the upper and lower bound whisker positions:

 upper whisker = min(max(x), Q_3 + 1.5 * IQR)
 lower whisker = max(min(x), Q_1 - 1.5 * IQR)

 -- snip --
 --
 NOT quite!

 The boxplot.stats help reads under the coef argument:
 ... the whiskers extend to the most extreme data point which is no more
 than coef times the length of the box away from the box.


 If there are outliers, and the most extreme data point within 1.5 *IQR of
 Q1 or Q3 is less than 1.5 IQRs, and the wisker may end earlier than
 1.5*IQR, but the data point at which it ends may NOT be max(x) or min(x).


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Peter Ehlers

On 2010-05-12 10:51, Robert Baer wrote:


- Original Message - Fantastic!

It would be great if the description could be modified to include the
mysterious bit about the upper and lower bound whisker positions:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)

-- snip --
--
NOT quite!

The boxplot.stats help reads under the coef argument:
... the whiskers extend to the most extreme data point which is no more
than coef times the length of the box away from the box.


If there are outliers, and the most extreme data point within 1.5 *IQR
of Q1 or Q3 is less than 1.5 IQRs, and the wisker may end earlier than
1.5*IQR, but the data point at which it ends may NOT be max(x) or min(x).



But even this is not quite correct.
The help page (quoted above) is, as is so often the case,
quite precise: the *length of the box* is multiplied by 1.5,
not the *IQR*. The difference is probably insignificant in most
applications, but then this question was about the precise
definition of the whiskers.

The box length is defined by the hinges, for whose definition
it's probably easiest to look at the code in fivenum() which
is used by boxplot.stats(). (The relevant code consists of three
short lines.) For the calculation of the whisker extremes, one
can peruse the boxplot.stats() code, which also is quite brief.
Essentially, it determines which observations lie outside the
boundaries established by (lower hinge - 1.5 * boxlength) and
(upper hinge + 1.5 * boxlength) and then uses the range of
the remaining data values to determine the whisker extremes.

(I've assumed the default value of coef=1.5).

Here's an example:

  set.seed(1)
  y - rexp(30, .02)
  y - sort(round(y))

  fivenum(y)
#[1]   3  22  38  61 221

  boxplot.stats(y)$stats
#[1]   3  22  38  61 118

# The hinges are 22, 61;
# The whisker extremes are 3, 118;

  quantile(y, c(1,3)/4)
#  25%   75%
#23.25 60.50

# The hinges do not equal the quartiles.

# Upper cut-off ('fence'):
  61 + 1.5 * (61 - 22)
#[1] 119.5

  tail(y)
#[1]  70  94 118 145 198 221

# So 118 is the largest data value less than or equal to 119.5.

  60.5 + 1.5 * IQR(y)
#[1] 116.375

# Using quartiles and the IQR would take the upper whisker to 94.

--
Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Shi, Tao
Jason, 

All these are clearly defined in the help file for 'boxplot' under 'range'.  
Don't understand how you missed that.

...Tao




- Original Message 
 From: Jason Rupert jasonkrup...@yahoo.com
 To: Dennis Murphy djmu...@gmail.com
 Cc: R Project Help R-help@r-project.org
 Sent: Wed, May 12, 2010 3:40:12 AM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 
 Fantastic! 

It would be great if the description could be modified to 
 include the mysterious bit about the upper and lower bound whisker 
 positions:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker 
 = max(min(x), Q_1 - 1.5 * IQR)

Maybe that is clearly written in the 
 description of boxplot.stats {grDevices}, but evidently I missed it numerous 
 times and also did not pick up on this intent from the original description 
 of 
 boxplot {graphics}.  

Your type of descriptive answer and 
 helpfulness is much appreciated and one of the reasons I continue to endorse 
 the 
 R tool over numerous others.  

More like you and the tool may be 
 headed for domination in the market. 

Thanks 
 again!







From: 
 Dennis Murphy 
 href=mailto:djmu...@gmail.com;djmu...@gmail.com

Cc: R Project 
 Help 
 href=mailto:R-help@r-project.org;R-help@r-project.org
Sent: Wed, 
 May 12, 2010 2:50:19 AM
Subject: Re: [R] Whiskers on the default boxplot 
 {graphics}

Hi:

Let's do some math 
 :)



e:

Okay...Let me see if I've got 
 it...

I'm just trying to use the default boxplot {graphics} 
 capability in R...

So I call something like the 
 following:
 boxplot(mpg~cyl,data=mtcars, main=Car Milage Data, 
 xlab=Number of Cylinders, ylab=Miles Per Gallon) \

That 
 produces something as shown in the 
 following:
http://www.statmethods.net/graphs/images/boxplot1.jpg

When 
 that default boxplot is called, i.e. boxplot {graphics}, as shown in the line 
 of 
 code above, it is actually calling into boxplot.stats {grDevices}.  When 
 boxplot.stats {grDevices} is called it has a default value for coef of 1.5, 
 i.e. coef = 1.5.

If I understand the purpose of coef 
 correctly, it means that the ‘whiskers’ should extend out 1.5 times the 
 length 
 of the box away from the box.   Is that correct?


If by 
 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 where Q 
 refers to quartile), then assuming that
x is the numeric vector of interest 
 for a boxplot,

upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower 
 whisker = max(min(x), Q_1 - 1.5 * IQR)

So the upper whisker is located at 
 the *smaller* of the maximum x value and Q_3 + 1.5 IQR,
whereas the lower 
 whisker is located at the *larger* of the smallest x value and Q_1 - 1.5 
 IQR.

In your terms, the whiskers should extend out a *maximum* of 1.5 
 times the length of the box
away from the box. 

Visually, this means 
 that individual points more extreme in value than Q3 + 1.5 IQR are 
 plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted 
 separately on the low
end. Depending on the source, the separately plotted 
 points are called 'outside values'. On
the other hand, if the maximum or 
 minimum values of x are closer than 1.5 IQR in distance from
its nearest 
 quartile, then that is where the whisker is positioned.

Does that make 
 sense?

HTH,
Dennis


Now I look back at the plot, and 
 I'm not sure how 1.5 times the length of the box corresponds with the whisker 
 lengths shown in the image:

 href=http://www.statmethods.net/graphs/images/boxplot1.jpg; target=_blank 
 http://www.statmethods.net/graphs/images/boxplot1.jpg

Is 
 it that the whisker length is a total of 1.5 the length of the box and 
 centered 
 about the median (2nd Quartile)?

Just trying to get a handle 
 on this, so thanks again for all the help in deciphering 
 this.








From: 
 RJ Cunningham 
 href=mailto:ro...@iinet.net.au;ro...@iinet.net.au


 target=_blank href=http://ast.net;ast.net
Cc: R Project 
 Help 
 href=mailto:R-help@r-project.org;R-help@r-project.org
Sent: 
 Tue, May 11, 2010 9:57:48 PM

Subject: Re: [R] Whiskers on the 
 default boxplot {graphics}


I think not. Isn't the 
 secret here?


Arguments:

x: a 
 numeric vector for which the boxplot will be constructed
('NA's and 
 'NaN's are allowed and omitted).

coef: this determines how 
 far the plot 'whiskers' extend out
from the box.  If 'coef' is 
 positive, the whiskers extend
to the most extreme data point which is 
 no more than
'coef' times the length of the box away from the box. 
 A
value of zero causes the whiskers to extend to the 
 data
extremes (and no outliers be 
 returned).

do.conf,do.out: logicals; if 'FALSE', the 'conf' 
 or 'out'
component respectively will be empty in the 
 result.

Details:

The two 'hinges' are 
 versions of the first and third quartile,...


On Wed 
 May 12 10:35 , Jason Rupert  sent:


HummMaybe 
 I need to look some place else than boxplot.stats {grDevices} for a 
 definition 
 of how

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Peter Ehlers

On 2010-05-12 13:27, Shi, Tao wrote:

Jason,

All these are clearly defined in the help file for 'boxplot' under 'range'.  
Don't understand how you missed that.

...Tao



You've made me re-read the help page for boxplot. I notice that
there's a difference in the description of 'range' on that page
and the description of the equivalent 'coef' on the help page
for boxplot.stats. boxplot.stats has it right.

This should be made consistent.

[previous posts snipped]
--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-12 Thread Shi, Tao
Hi Peter,

You're absolutely correct!  The description for 'range' in 'boxplot' help file 
is a little bit confusing by using the words interquartile range.  I think it 
should be changed to the length of the box to be exact and consistent with 
those in the help file for boxplot.stats.

...Tao




- Original Message 
 From: Peter Ehlers ehl...@ucalgary.ca
 To: Shi, Tao shida...@yahoo.com
 Cc: Jason Rupert jasonkrup...@yahoo.com; Dennis Murphy djmu...@gmail.com; 
 R Project Help R-help@r-project.org; murdoch.dun...@gmail.com
 Sent: Wed, May 12, 2010 2:11:24 PM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 
 On 2010-05-12 13:27, Shi, Tao wrote:
 Jason,

 All these 
 are clearly defined in the help file for 'boxplot' under 'range'.  Don't 
 understand how you missed that.

 ...Tao


You've 
 made me re-read the help page for boxplot. I notice that
there's a difference 
 in the description of 'range' on that page
and the description of the 
 equivalent 'coef' on the help page
for boxplot.stats. boxplot.stats has it 
 right.

This should be made consistent.

[previous posts 
 snipped]
-- 
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert
How are the lower/upper whiskers defined in the default version of boxplot 
{graphics}?

I tried help(boxplot) and searching www.rseek.org, but I was unable to 
determine an absolute answer.  

I checked out the definition of boxplot according to Wikipedia 
(http://en.wikipedia.org/wiki/Box_plot), but it also had several approaches 
listed for how the whiskers could be determined, so I'm just curious how the 
default 
boxplot {graphics} does it. 

Thanks for any feedback and insights.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread David Winsemius


On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

How are the lower/upper whiskers defined in the default version of  
boxplot {graphics}?


I tried help(boxplot) and searching www.rseek.org, but I was unable  
to determine an absolute answer.


You need to follow the links from the help pages and tin this case it  
appears that you did not follow the one to


?boxplot.stats



I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot 
), but it also had several approaches
listed for how the whiskers could be determined, so I'm just curious  
how the default

boxplot {graphics} does it.

Thanks for any feedback


Follow links with the R help system.


and insights.




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert
Wowzers...

From ?boxplot.stats:

Details

The two ‘hinges’ are versions of the first and third quartile, i.e., close to 
quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n - 
length(x)) and differ for even n. Whereas the quartiles only equal observations 
for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n 
= 2 mod 4), and are in the middle of two observations otherwise.

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be 
based on the same calculations as the formula with 1.57 in Chambers et al. 
(1983, p. 62), given in McGill et al. (1978, p. 16). They are based on 
asymptotic normality of the median and roughly equal sample sizes for the two 
medians being compared, and are said to be rather insensitive to the underlying 
distributions of the samples. The idea appears to be to give roughly a 95% 
confidence interval for the difference in two medians.



Is a notch equal to the upper/lower whisker?   Is this just a difference of 
terminology or something? 

Thanks again for all the insights. 




- Original Message 
From: David Winsemius dwinsem...@comcast.net
To: Jason Rupert jasonkrup...@yahoo.com
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:00:15 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}


On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

 How are the lower/upper whiskers defined in the default version of boxplot 
 {graphics}?
 
 I tried help(boxplot) and searching www.rseek.org, but I was unable to 
 determine an absolute answer.

You need to follow the links from the help pages and tin this case it appears 
that you did not follow the one to

?boxplot.stats

 
 I checked out the definition of boxplot according to Wikipedia 
 (http://en.wikipedia.org/wiki/Box_plot), but it also had several approaches
 listed for how the whiskers could be determined, so I'm just curious how the 
 default
 boxplot {graphics} does it.
 
 Thanks for any feedback

Follow links with the R help system.

 and insights.



David Winsemius, MD
West Hartford, CT




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert
HummMaybe I need to look some place else than boxplot.stats {grDevices} for 
a definition of how the upper/lower whiskers are produced.  

By any chance are they the lowest datum still within 1.5 IQR of the lower 
quartile, and the highest datum still within 1.5 IQR of the upper quartile?

None of the links from boxplot.stats {grDevices} seemed to reveal the secret 
definition of the R whiskers.  

Thanks again.





- Original Message 
From: Jason Rupert jasonkrup...@yahoo.com
To: David Winsemius dwinsem...@comcast.net
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:26:25 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

Wowzers...

From ?boxplot.stats:

Details

The two ‘hinges’ are versions of the first and third quartile, i.e., close to 
quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n - 
length(x)) and differ for even n. Whereas the quartiles only equal observations 
for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n 
= 2 mod 4), and are in the middle of two observations otherwise.

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be 
based on the same calculations as the formula with 1.57 in Chambers et al. 
(1983, p. 62), given in McGill et al. (1978, p. 16). They are based on 
asymptotic normality of the median and roughly equal sample sizes for the two 
medians being compared, and are said to be rather insensitive to the underlying 
distributions of the samples. The idea appears to be to give roughly a 95% 
confidence interval for the difference in two medians.



Is a notch equal to the upper/lower whisker?   Is this just a difference of 
terminology or something? 

Thanks again for all the insights. 




- Original Message 
From: David Winsemius dwinsem...@comcast.net
To: Jason Rupert jasonkrup...@yahoo.com
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:00:15 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}


On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

 How are the lower/upper whiskers defined in the default version of boxplot 
 {graphics}?
 
 I tried help(boxplot) and searching www.rseek.org, but I was unable to 
 determine an absolute answer.

You need to follow the links from the help pages and tin this case it appears 
that you did not follow the one to

?boxplot.stats

 
 I checked out the definition of boxplot according to Wikipedia 
 (http://en.wikipedia.org/wiki/Box_plot), but it also had several approaches
 listed for how the whiskers could be determined, so I'm just curious how the 
 default
 boxplot {graphics} does it.
 
 Thanks for any feedback

Follow links with the R help system.

 and insights.



David Winsemius, MD
West Hartford, CT




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread RJ Cunningham

   I think not. Isn't the secret here?
   Arguments:
   x: a numeric vector for which the boxplot will be constructed
   ('NA's and 'NaN's are allowed and omitted).
   coef: this determines how far the plot 'whiskers' extend out
   from the box. If 'coef' is positive, the whiskers extend
   to the most extreme data point which is no more than
   'coef' times the length of the box away from the box. A
   value of zero causes the whiskers to extend to the data
   extremes (and no outliers be returned).
   do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
   component respectively will be empty in the result.
   Details:
   The two 'hinges' are versions of the first and third quartile,...
   On Wed May 12 10:35 , Jason Rupert sent:

 HummMaybe  I  need  to  look some place else than boxplot.stats
 {grDevices} for a definition of how the upper/lower whiskers are produced.
 By any chance are they the lowest datum still within 1.5 IQR of the lower
 quartile,  and  the highest datum still within 1.5 IQR of the upper
 quartile?
 None of the links from boxplot.stats {grDevices} seemed to reveal the
 secret definition of the R whiskers.
 Thanks again.
 - Original Message 
 From: Jason Rupert [1]jasonkrup...@yahoo.com
 To: David Winsemius [2]dwinsem...@comcast.net
 Cc: R Project Help [3]r-h...@r-project.org
 Sent: Tue, May 11, 2010 9:26:25 PM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 Wowzers...
 From ?boxplot.stats:
 Details
 The two âhingesâ are versions of the first and third quartile, i.e., close
 to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where
 n - length(x)) and differ for even n. Whereas the quartiles only equal
 observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally
 for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations
 otherwise.
 The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be
 based on the same calculations as the formula with 1.57 in Chambers et al.
 (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on
 asymptotic normality of the median and roughly equal sample sizes for the
 two medians being compared, and are said to be rather insensitive to the
 underlying distributions of the samples. The idea appears to be to give
 roughly a 95% confidence interval for the difference in two medians.
 Is a notch equal to the upper/lower whisker? Is this just a difference of
 terminology or something?
 Thanks again for all the insights.
 - Original Message 
 From: David Winsemius [4]dwinsem...@comcast.net
 To: Jason Rupert [5]jasonkrup...@yahoo.com
 Cc: R Project Help [6]r-h...@r-project.org
 Sent: Tue, May 11, 2010 9:00:15 PM
 Subject: Re: [R] Whiskers on the default boxplot {graphics}
 On May 11, 2010, at 9:45 PM, Jason Rupert wrote:
  How are the lower/upper whiskers defined in the default version of
 boxplot {graphics}?
 
  I tried help(boxplot) and searching [7]www.rseek.org, but I was unable
 to determine an absolute answer.
 You need to follow the links from the help pages and tin this case it
 appears that you did not follow the one to
 ?boxplot.stats
 
   I  checked  out the definition of boxplot according to Wikipedia
 ([8]http://en.wikipedia.org/wiki/Box_plot\), but it also had several
 approaches
  listed for how the whiskers could be determined, so I'm just curious how
 the default
  boxplot {graphics} does it.
 
  Thanks for any feedback
 Follow links with the R help system.
  and insights.
 David Winsemius, MD
 West Hartford, CT
 __
 [9]r-h...@r-project.org mailing list
 [10]https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 [11]http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 __
 [12]r-h...@r-project.org mailing list
 [13]https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 [14]http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

References

   1. 
javascript:top.opencompose%28%27jasonkrup...@yahoo.com%27,%27%27,%27%27,%27%27%29
   2. 
javascript:top.opencompose%28%27dwinsem...@comcast.net%27,%27%27,%27%27,%27%27%29
   3. 
javascript:top.opencompose%28%27r-h...@r-project.org%27,%27%27,%27%27,%27%27%29
   4. 
javascript:top.opencompose%28%27dwinsem...@comcast.net%27,%27%27,%27%27,%27%27%29
   5. 
javascript:top.opencompose%28%27jasonkrup...@yahoo.com%27,%27%27,%27%27,%27%27%29
   6. 
javascript:top.opencompose%28%27r-h...@r-project.org%27,%27%27,%27%27,%27%27%29
   7. file://localhost/tmp/parse.pl

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread Jason Rupert
Okay...Let me see if I've got it...

I'm just trying to use the default boxplot {graphics} capability in R...

So I call something like the following:
 boxplot(mpg~cyl,data=mtcars, main=Car Milage Data, xlab=Number of 
 Cylinders, ylab=Miles Per Gallon) \

That produces something as shown in the following:
http://www.statmethods.net/graphs/images/boxplot1.jpg

When that default boxplot is called, i.e. boxplot {graphics}, as shown in the 
line of code above, it is actually calling into boxplot.stats {grDevices}.  
When boxplot.stats {grDevices} is called it has a default value for coef of 
1.5, i.e. coef = 1.5.  

If I understand the purpose of coef correctly, it means that the 
‘whiskers’ should extend out 1.5 times the length of the box away from the 
box.   Is that correct?  

Now I look back at the plot, and I'm not sure how 1.5 times the length of the 
box corresponds with the whisker lengths shown in the image:
http://www.statmethods.net/graphs/images/boxplot1.jpg

Is it that the whisker length is a total of 1.5 the length of the box and 
centered about the median (2nd Quartile)?  

Just trying to get a handle on this, so thanks again for all the help in 
deciphering this. 








From: RJ Cunningham ro...@iinet.net.au

ast.net
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:57:48 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

I think not. Isn't the secret here?


Arguments: 

x: a numeric vector for which the boxplot will be constructed 
('NA's and 'NaN's are allowed and omitted). 

coef: this determines how far the plot 'whiskers' extend out 
from the box.  If 'coef' is positive, the whiskers extend 
to the most extreme data point which is no more than 
'coef' times the length of the box away from the box. A 
value of zero causes the whiskers to extend to the data 
extremes (and no outliers be returned). 

do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' 
component respectively will be empty in the result. 

Details: 

The two 'hinges' are versions of the first and third quartile,... 


On Wed May 12 10:35 , Jason Rupert  sent:


HummMaybe I need to look some place else than boxplot.stats {grDevices} for 
a definition of how the upper/lower whiskers are produced. 


By any chance are they the lowest datum still within 1.5 IQR of the lower 
quartile, and the highest datum still within 1.5 IQR of the upper quartile?


None of the links from boxplot.stats {grDevices} seemed to reveal the secret 
definition of the R whiskers. 


Thanks again.






- Original Message 



To: David Winsemius dwinsem...@comcast.net

Cc: R Project Help R-help@r-project.org

Sent: Tue, May 11, 2010 9:26:25 PM

Subject: Re: [R] Whiskers on the default boxplot {graphics}


Wowzers...


From ?boxplot.stats:


Details


The two ‘hinges’ are versions of the first and third quartile, i.e., close 
to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n - 
length(x)) and differ for even n. Whereas the quartiles only equal 
observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for 
n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise.


The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be 
based on the same calculations as the formula with 1.57 in Chambers et al. 
(1983, p. 62), given in McGill et al. (1978, p. 16). They are based on 
asymptotic normality of the median and roughly equal sample sizes for the two 
medians being compared, and are said to be rather insensitive to the 
underlying distributions of the samples. The idea appears to be to give 
roughly a 95% confidence interval for the difference in two medians.




Is a notch equal to the upper/lower whisker?   Is this just a difference of 
terminology or something? 


Thanks again for all the insights. 





- Original Message 

From: David Winsemius dwinsem...@comcast.net



Cc: R Project Help R-help@r-project.org

Sent: Tue, May 11, 2010 9:00:15 PM

Subject: Re: [R] Whiskers on the default boxplot {graphics}



On May 11, 2010, at 9:45 PM, Jason Rupert wrote:


 How are the lower/upper whiskers defined in the default version of boxplot 
 {graphics}?

 

 I tried help(boxplot) and searching www.rseek.org, but I was unable to 
 determine an absolute answer.


You need to follow the links from the help pages and tin this case it appears 
that you did not follow the one to


?boxplot.stats


 

 I checked out the definition of boxplot according to Wikipedia 
 (http://en.wikipedia.org/wiki/Box_plot\), but it also had several approaches

 listed for how the whiskers could be determined, so I'm just curious how the 
 default

 boxplot {graphics} does it.

 

 Thanks for any feedback


Follow links with the R help system.


 and insights.




David Winsemius, MD

West Hartford, CT





__
R-help@r-project.org mailing list
https://stat.ethz.ch

Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread David Winsemius


On May 11, 2010, at 10:35 PM, Jason Rupert wrote:

HummMaybe I need to look some place else than boxplot.stats  
{grDevices} for a definition of how the upper/lower whiskers are  
produced.


By any chance are they the lowest datum still within 1.5 IQR of the  
lower quartile, and the highest datum still within 1.5 IQR of the  
upper quartile?


None of the links from boxplot.stats {grDevices} seemed to reveal  
the secret definition of the R whiskers.


You didn't need to go to any other pages. You just needed to read  
boxplot.stats ... apparently more than once.


--
David.


Thanks again.





- Original Message 
From: Jason Rupert jasonkrup...@yahoo.com
To: David Winsemius dwinsem...@comcast.net
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:26:25 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

Wowzers...

From ?boxplot.stats:

Details

The two ‘hinges’ are versions of the first and third quartile, i.e.,  
close to quantile(x, c(1,3)/4). The hinges equal the quartiles for  
odd n (where n - length(x)) and differ for even n. Whereas the  
quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the  
hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in  
the middle of two observations otherwise.


The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems  
to be based on the same calculations as the formula with 1.57 in  
Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16).  
They are based on asymptotic normality of the median and roughly  
equal sample sizes for the two medians being compared, and are said  
to be rather insensitive to the underlying distributions of the  
samples. The idea appears to be to give roughly a 95% confidence  
interval for the difference in two medians.




Is a notch equal to the upper/lower whisker?   Is this just a  
difference of terminology or something?


Thanks again for all the insights.




- Original Message 
From: David Winsemius dwinsem...@comcast.net
To: Jason Rupert jasonkrup...@yahoo.com
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:00:15 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}


On May 11, 2010, at 9:45 PM, Jason Rupert wrote:

How are the lower/upper whiskers defined in the default version of  
boxplot {graphics}?


I tried help(boxplot) and searching www.rseek.org, but I was unable  
to determine an absolute answer.


You need to follow the links from the help pages and tin this case  
it appears that you did not follow the one to


?boxplot.stats



I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot 
), but it also had several approaches
listed for how the whiskers could be determined, so I'm just  
curious how the default

boxplot {graphics} does it.

Thanks for any feedback


Follow links with the R help system.


and insights.




David Winsemius, MD
West Hartford, CT




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Whiskers on the default boxplot {graphics}

2010-05-11 Thread David Winsemius


On May 11, 2010, at 11:55 PM, Jason Rupert wrote:


Okay...Let me see if I've got it...

I'm just trying to use the default boxplot {graphics} capability in  
R...


So I call something like the following:
boxplot(mpg~cyl,data=mtcars, main=Car Milage Data, xlab=Number  
of Cylinders, ylab=Miles Per Gallon) \


That produces something as shown in the following:
http://www.statmethods.net/graphs/images/boxplot1.jpg

When that default boxplot is called, i.e. boxplot {graphics}, as  
shown in the line of code above, it is actually calling into  
boxplot.stats {grDevices}.  When boxplot.stats {grDevices} is called  
it has a default value for coef of 1.5, i.e. coef = 1.5.


If I understand the purpose of coef correctly, it means that the  
‘whiskers’ should extend out 1.5 times the length of the box away  
from the box.   Is that correct?


No. Read it again.

--
David.


Now I look back at the plot, and I'm not sure how 1.5 times the  
length of the box corresponds with the whisker lengths shown in the  
image:

http://www.statmethods.net/graphs/images/boxplot1.jpg

Is it that the whisker length is a total of 1.5 the length of the  
box and centered about the median (2nd Quartile)?


Just trying to get a handle on this, so thanks again for all the  
help in deciphering this.









From: RJ Cunningham ro...@iinet.net.au

ast.net
Cc: R Project Help R-help@r-project.org
Sent: Tue, May 11, 2010 9:57:48 PM
Subject: Re: [R] Whiskers on the default boxplot {graphics}

I think not. Isn't the secret here?


Arguments:

x: a numeric vector for which the boxplot will be constructed
('NA's and 'NaN's are allowed and omitted).

coef: this determines how far the plot 'whiskers' extend out
from the box.  If 'coef' is positive, the whiskers extend
to the most extreme data point which is no more than
'coef' times the length of the box away from the box. A
value of zero causes the whiskers to extend to the data
extremes (and no outliers be returned).

do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
component respectively will be empty in the result.

Details:

The two 'hinges' are versions of the first and third quartile,...


On Wed May 12 10:35 , Jason Rupert  sent:


HummMaybe I need to look some place else than boxplot.stats  
{grDevices} for a definition of how the upper/lower whiskers are  
produced.




By any chance are they the lowest datum still within 1.5 IQR of  
the lower quartile, and the highest datum still within 1.5 IQR of  
the upper quartile?




None of the links from boxplot.stats {grDevices} seemed to reveal  
the secret definition of the R whiskers.





Thanks again.








- Original Message 







To: David Winsemius dwinsem...@comcast.net



Cc: R Project Help R-help@r-project.org



Sent: Tue, May 11, 2010 9:26:25 PM



Subject: Re: [R] Whiskers on the default boxplot {graphics}




Wowzers...




From ?boxplot.stats:




Details



The two ‘hinges’ are versions of the first and third quartile,  
i.e., close to quantile(x, c(1,3)/4). The hinges equal the  
quartiles for odd n (where n - length(x)) and differ for even n.  
Whereas the quartiles only equal observations for n %% 4 == 1 (n =  
1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod  
4), and are in the middle of two observations otherwise.




The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This  
seems to be based on the same calculations as the formula with 1.57  
in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p.  
16). They are based on asymptotic normality of the median and  
roughly equal sample sizes for the two medians being compared, and  
are said to be rather insensitive to the underlying distributions  
of the samples. The idea appears to be to give roughly a 95%  
confidence interval for the difference in two medians.






Is a notch equal to the upper/lower whisker?   Is this just a  
difference of terminology or something?





Thanks again for all the insights.







- Original Message 



From: David Winsemius dwinsem...@comcast.net







Cc: R Project Help R-help@r-project.org



Sent: Tue, May 11, 2010 9:00:15 PM



Subject: Re: [R] Whiskers on the default boxplot {graphics}





On May 11, 2010, at 9:45 PM, Jason Rupert wrote:



How are the lower/upper whiskers defined in the default version of  
boxplot {graphics}?




I tried help(boxplot) and searching www.rseek.org, but I was  
unable to determine an absolute answer.




You need to follow the links from the help pages and tin this case  
it appears that you did not follow the one to





?boxplot.stats





I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot 
\), but it also had several approaches


listed for how the whiskers could be determined, so I'm just  
curious how the default


boxplot {graphics} does it.



Thanks for any feedback





Follow links with the R help system.



and insights