Re: [R] Whiskers on the default boxplot {graphics}
Just to put this topic to rest: The hinges match quantile(x, probs = c(1,3)/4, type = 2) except when n = 3 mod 4. I no longer have Tukey's EDA book, but I think that his idea was that hinges (aka quartiles) were defined as medians of the lower/upper halves of the (sorted, of course) data, where a 'half' would include the median for odd sample sizes. And that's how they are calculated in fivenum(). Thus hinges are a 10th definition of quartiles, but they don't lend themselves to generalization to arbitrary quantiles other than, say, octiles or other (1/2^k)-iles. -Peter On 2010-05-13 11:47, David Winsemius wrote: I agree. I was convinced by Ehlers' example that type =2 was a better match to fivenum's result -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Hi Robert, Your points are well taken. However, I reserve mine, b/c I think without this detailed discussion, an average R user would simply confused the "interquartile range" said in boxplot help file with the results of "IQR". Changing it to "length of box" makes it more exact and consistent, as I stated earlier. With all these being said, this is up to the R core team to decide. ...Tao - Original Message > From: Robert Baer > To: "Shi, Tao" ; Peter Ehlers > Cc: R Project Help > Sent: Thu, May 13, 2010 7:25:09 AM > Subject: Re: [R] Whiskers on the default boxplot {graphics} > > > Hi Peter, > > You're absolutely correct! The description > for 'range' in 'boxplot' help file is a little bit confusing by using the > words > "interquartile range". I think it should be changed to the "length of the > box" > to be exact and consistent with those in the help file for > "boxplot.stats". The issue is probably that there are multiple ways (9 to > be exact) of defining quantiles in R. See 'type= ' arguement for > ?quantile. The quantile function uses type=7 by default which matches the > quantile definition used by S-Plus(?), but differs from that used by SPSS. > Doesn't fivenum essentially use the equivalent of a different "type= " > arguement > (maybe 2 or 5) in constructing the hinges? It seems perfectly reasonable > to talk about 'length of box' (or 'box height' depending how you display the > boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the > possible > quartile definitions (as Peter points out the one used by fivenum)? The > box height does not necesarily match the distance produced by IQR() which > also > seems to use the equivalent of quantile(..., type=7), but it is still an IQR, > is > it not? Quantiles apparantly can be defined in more than one "acceptable" > way (sort of like dealing with ties in rank statistics). The OP seemed to > want an "exact" explanation of the wiskers, and I think Peter has pointed us > at > the definition of quartiles used by fivenum, as opposed to the default > used with quantile(..., "type=7"). All that said, I'm not convinced that > it is wrong to speak of "interquartile range" in 'boxplot' > help. Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
I agree. I was convinced by Ehlers' example that type =2 was a better match to fivenum's result -- David.. On May 13, 2010, at 1:36 PM, Joshua Wiley wrote: On Thu, May 13, 2010 at 7:55 AM, David Winsemius > wrote: Yes, and experimentation leads me to the conclusion that the only possible candidate for matching up the results of fivenum[c(2,4] with quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from mathematical arguments. since I do not quite understand the formalism in the quantile page. If the match is not exact, this would be a tenth definition of IQR. David, Here is some sample data, and the most parsimonious code I could come up with for how quantile() computes the quartiles when using type=5. The code for fivenum() seems simple enough, but I am not quite able to make enough sense of the code for type=5 from quantile() to say confidently why they are different. I am open to the possibility that my attempts to extract relevant code from quantile were flawed, but my tentative conclusion is that quantile(x, type=5) != fivenum(x). ## x <- c(0.643796386452606, -0.605277531056206, -0.339239367816402, 1.12408365699422, 0.615753476531243, -1.10545696568758, 0.666533406841698, 1.42794492209271, 0.624752921945051, 2.02317205214712, -0.365586657432646, 0.821742701084307, -0.874753498321076, -0.0298783402061118, 1.18037670706428, -0.178274986836195, 0.308703365439049, 0.619700844646392, 0.54977981430092, -1.82161514610448, -1.28413556650749, -0.0443852992196351, 0.704196760556652, -1.88596816676741, -0.420811351737096) oldx <- x #this is just a backup because x will be transformed ##Start from quantile() probs <- c(0, 0.25, 0.5, 0.75, 1) type <- 5 n <- length(x) switch(type - 3, { a <- 0 b <- 1 }, a <- b <- 0.5, a <- b <- 0, a <- b <- 1, a <- b <- 1/3, a <- b <- 3/8) fuzz <- 4 * .Machine$double.eps nppm <- a + probs * (n + 1 - a - b) j <- floor(nppm + fuzz) h <- nppm - j h <- ifelse(abs(h) < fuzz, 0, h) x <- sort(x, partial = unique(c(1, j[j > 0L & j <= n], (j + 1)[j > 0L & j < n], n))) x <- c(x[1L], x[1L], x, x[n], x[n]) qs <- x[j + 2L] qs[h == 1] <- x[j + 3L][h == 1] other <- (h > 0) && (h < 1) if (any(other)) qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L]) [other] ##End from quantile qs # from the calculations above quantile(oldx, type=5) #this should match qs fivenum(oldx) #the 25% does not match Josh David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
On Thu, May 13, 2010 at 7:55 AM, David Winsemius wrote: > Yes, and experimentation leads me to the conclusion that the only possible > candidate for matching up the results of fivenum[c(2,4] with quantile(y, > c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from > mathematical arguments. since I do not quite understand the formalism in the > quantile page. If the match is not exact, this would be a tenth definition > of IQR. David, Here is some sample data, and the most parsimonious code I could come up with for how quantile() computes the quartiles when using type=5. The code for fivenum() seems simple enough, but I am not quite able to make enough sense of the code for type=5 from quantile() to say confidently why they are different. I am open to the possibility that my attempts to extract relevant code from quantile were flawed, but my tentative conclusion is that quantile(x, type=5) != fivenum(x). ## x <- c(0.643796386452606, -0.605277531056206, -0.339239367816402, 1.12408365699422, 0.615753476531243, -1.10545696568758, 0.666533406841698, 1.42794492209271, 0.624752921945051, 2.02317205214712, -0.365586657432646, 0.821742701084307, -0.874753498321076, -0.0298783402061118, 1.18037670706428, -0.178274986836195, 0.308703365439049, 0.619700844646392, 0.54977981430092, -1.82161514610448, -1.28413556650749, -0.0443852992196351, 0.704196760556652, -1.88596816676741, -0.420811351737096) oldx <- x #this is just a backup because x will be transformed ##Start from quantile() probs <- c(0, 0.25, 0.5, 0.75, 1) type <- 5 n <- length(x) switch(type - 3, { a <- 0 b <- 1 }, a <- b <- 0.5, a <- b <- 0, a <- b <- 1, a <- b <- 1/3, a <- b <- 3/8) fuzz <- 4 * .Machine$double.eps nppm <- a + probs * (n + 1 - a - b) j <- floor(nppm + fuzz) h <- nppm - j h <- ifelse(abs(h) < fuzz, 0, h) x <- sort(x, partial = unique(c(1, j[j > 0L & j <= n], (j + 1)[j > 0L & j < n], n))) x <- c(x[1L], x[1L], x, x[n], x[n]) qs <- x[j + 2L] qs[h == 1] <- x[j + 3L][h == 1] other <- (h > 0) && (h < 1) if (any(other)) qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L])[other] ##End from quantile qs # from the calculations above quantile(oldx, type=5) #this should match qs fivenum(oldx) #the 25% does not match Josh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
On May 13, 2010, at 12:18 PM, Robert Baer wrote: And try this (which seems to leave us with type=2) and is listed in ? quantile as "Discontinuous sample quantile types 1, 2, and 3" quantile(1:101, c(1,3)/4, type=2) 25% 75% 26 76 I think Peter may be right,. If I do it with the rnorm function I repeatedly get the same result for fivenum[2] and the type 2 first quartile. I did not test those types because they were designed for discrete values variables, but I suppose everything is really discrete on computers, eh? > fivenum(x <- rnorm(101) ) [1] -2.6224338 -0.9682586 -0.1897377 0.5999332 2.5409711 > quantile(x, c(1,3)/4, type=2) 25%75% -0.9682586 0.5999332 > fivenum(x <- rnorm(101) ) [1] -3.8251928 -0.6495966 0.1816233 0.7101774 2.3789054 > quantile(x, c(1,3)/4, type=2) 25%75% -0.6495966 0.7101774 -- David. David, try this: fivenum(1:101) quantile(1:101, c(1,3)/4, type=5) -Peter On 2010-05-13 8:55, David Winsemius wrote: On May 13, 2010, at 10:25 AM, Robert Baer wrote: Hi Peter, You're absolutely correct! The description for 'range' in 'boxplot' help file is a little bit confusing by using the words "interquartile range". I think it should be changed to the "length of the box" to be exact and consistent with those in the help file for "boxplot.stats". The issue is probably that there are multiple ways (9 to be exact) of defining quantiles in R. See 'type= ' arguement for ?quantile. The quantile function uses type=7 by default which matches the quantile definition used by S-Plus(?), but differs from that used by SPSS. Doesn't fivenum essentially use the equivalent of a different "type= " arguement (maybe 2 or 5) in constructing the hinges? It seems perfectly reasonable to talk about 'length of box' (or 'box height' depending how you display the boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the possible quartile definitions (as Peter points out the one used by fivenum)? The box height does not necesarily match the distance produced by IQR() which also seems to use the equivalent of quantile(..., type=7), but it is still an IQR, is it not? Quantiles apparantly can be defined in more than one "acceptable" way (sort of like dealing with ties in rank statistics). The OP seemed to want an "exact" explanation of the wiskers, and I think Peter has pointed us at the definition of quartiles used by fivenum, as opposed to the default used with quantile(..., "type=7"). Yes, and experimentation leads me to the conclusion that the only possible candidate for matching up the results of fivenum[c(2,4] with quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from mathematical arguments. since I do not quite understand the formalism in the quantile page. If the match is not exact, this would be a tenth definition of IQR. > set.seed(123) > y <- rexp(300, .02) > fivenum(y) [1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788 > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) } 25% 75% 15.82506 73.93080 25% 75% 15.87405 74.03625 25% 75% 15.84955 74.08898 25% 75% 15.89854 73.98352 25% 75% 15.86588 74.05383 25% 75% 15.86792 74.04943 -- Peter Ehlers University of Calgary David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
And try this (which seems to leave us with type=2) and is listed in ?quantile as "Discontinuous sample quantile types 1, 2, and 3" quantile(1:101, c(1,3)/4, type=2) 25% 75% 26 76 David, try this: fivenum(1:101) quantile(1:101, c(1,3)/4, type=5) -Peter On 2010-05-13 8:55, David Winsemius wrote: On May 13, 2010, at 10:25 AM, Robert Baer wrote: Hi Peter, You're absolutely correct! The description for 'range' in 'boxplot' help file is a little bit confusing by using the words "interquartile range". I think it should be changed to the "length of the box" to be exact and consistent with those in the help file for "boxplot.stats". The issue is probably that there are multiple ways (9 to be exact) of defining quantiles in R. See 'type= ' arguement for ?quantile. The quantile function uses type=7 by default which matches the quantile definition used by S-Plus(?), but differs from that used by SPSS. Doesn't fivenum essentially use the equivalent of a different "type= " arguement (maybe 2 or 5) in constructing the hinges? It seems perfectly reasonable to talk about 'length of box' (or 'box height' depending how you display the boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the possible quartile definitions (as Peter points out the one used by fivenum)? The box height does not necesarily match the distance produced by IQR() which also seems to use the equivalent of quantile(..., type=7), but it is still an IQR, is it not? Quantiles apparantly can be defined in more than one "acceptable" way (sort of like dealing with ties in rank statistics). The OP seemed to want an "exact" explanation of the wiskers, and I think Peter has pointed us at the definition of quartiles used by fivenum, as opposed to the default used with quantile(..., "type=7"). Yes, and experimentation leads me to the conclusion that the only possible candidate for matching up the results of fivenum[c(2,4] with quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from mathematical arguments. since I do not quite understand the formalism in the quantile page. If the match is not exact, this would be a tenth definition of IQR. > set.seed(123) > y <- rexp(300, .02) > fivenum(y) [1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788 > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) } 25% 75% 15.82506 73.93080 25% 75% 15.87405 74.03625 25% 75% 15.84955 74.08898 25% 75% 15.89854 73.98352 25% 75% 15.86588 74.05383 25% 75% 15.86792 74.04943 -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
David, try this: fivenum(1:101) quantile(1:101, c(1,3)/4, type=5) -Peter On 2010-05-13 8:55, David Winsemius wrote: On May 13, 2010, at 10:25 AM, Robert Baer wrote: Hi Peter, You're absolutely correct! The description for 'range' in 'boxplot' help file is a little bit confusing by using the words "interquartile range". I think it should be changed to the "length of the box" to be exact and consistent with those in the help file for "boxplot.stats". The issue is probably that there are multiple ways (9 to be exact) of defining quantiles in R. See 'type= ' arguement for ?quantile. The quantile function uses type=7 by default which matches the quantile definition used by S-Plus(?), but differs from that used by SPSS. Doesn't fivenum essentially use the equivalent of a different "type= " arguement (maybe 2 or 5) in constructing the hinges? It seems perfectly reasonable to talk about 'length of box' (or 'box height' depending how you display the boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the possible quartile definitions (as Peter points out the one used by fivenum)? The box height does not necesarily match the distance produced by IQR() which also seems to use the equivalent of quantile(..., type=7), but it is still an IQR, is it not? Quantiles apparantly can be defined in more than one "acceptable" way (sort of like dealing with ties in rank statistics). The OP seemed to want an "exact" explanation of the wiskers, and I think Peter has pointed us at the definition of quartiles used by fivenum, as opposed to the default used with quantile(..., "type=7"). Yes, and experimentation leads me to the conclusion that the only possible candidate for matching up the results of fivenum[c(2,4] with quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from mathematical arguments. since I do not quite understand the formalism in the quantile page. If the match is not exact, this would be a tenth definition of IQR. > set.seed(123) > y <- rexp(300, .02) > fivenum(y) [1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788 > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) } 25% 75% 15.82506 73.93080 25% 75% 15.87405 74.03625 25% 75% 15.84955 74.08898 25% 75% 15.89854 73.98352 25% 75% 15.86588 74.05383 25% 75% 15.86792 74.04943 -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
On May 13, 2010, at 10:25 AM, Robert Baer wrote: Hi Peter, You're absolutely correct! The description for 'range' in 'boxplot' help file is a little bit confusing by using the words "interquartile range". I think it should be changed to the "length of the box" to be exact and consistent with those in the help file for "boxplot.stats". The issue is probably that there are multiple ways (9 to be exact) of defining quantiles in R. See 'type= ' arguement for ?quantile. The quantile function uses type=7 by default which matches the quantile definition used by S-Plus(?), but differs from that used by SPSS. Doesn't fivenum essentially use the equivalent of a different "type= " arguement (maybe 2 or 5) in constructing the hinges? It seems perfectly reasonable to talk about 'length of box' (or 'box height' depending how you display the boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the possible quartile definitions (as Peter points out the one used by fivenum)? The box height does not necesarily match the distance produced by IQR() which also seems to use the equivalent of quantile(..., type=7), but it is still an IQR, is it not? Quantiles apparantly can be defined in more than one "acceptable" way (sort of like dealing with ties in rank statistics). The OP seemed to want an "exact" explanation of the wiskers, and I think Peter has pointed us at the definition of quartiles used by fivenum, as opposed to the default used with quantile(..., "type=7"). Yes, and experimentation leads me to the conclusion that the only possible candidate for matching up the results of fivenum[c(2,4] with quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove that to myself from mathematical arguments. since I do not quite understand the formalism in the quantile page. If the match is not exact, this would be a tenth definition of IQR. > set.seed(123) > y <- rexp(300, .02) > fivenum(y) [1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788 > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) } 25% 75% 15.82506 73.93080 25% 75% 15.87405 74.03625 25% 75% 15.84955 74.08898 25% 75% 15.89854 73.98352 25% 75% 15.86588 74.05383 25% 75% 15.86792 74.04943 -- David. All that said, I'm not convinced that it is wrong to speak of "interquartile range" in 'boxplot' help. Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Hi Peter, You're absolutely correct! The description for 'range' in 'boxplot' help file is a little bit confusing by using the words "interquartile range". I think it should be changed to the "length of the box" to be exact and consistent with those in the help file for "boxplot.stats". The issue is probably that there are multiple ways (9 to be exact) of defining quantiles in R. See 'type= ' arguement for ?quantile. The quantile function uses type=7 by default which matches the quantile definition used by S-Plus(?), but differs from that used by SPSS. Doesn't fivenum essentially use the equivalent of a different "type= " arguement (maybe 2 or 5) in constructing the hinges? It seems perfectly reasonable to talk about 'length of box' (or 'box height' depending how you display the boxplot), but aren't the hinges simply Q1 and Q3 defined by one of the possible quartile definitions (as Peter points out the one used by fivenum)? The box height does not necesarily match the distance produced by IQR() which also seems to use the equivalent of quantile(..., type=7), but it is still an IQR, is it not? Quantiles apparantly can be defined in more than one "acceptable" way (sort of like dealing with ties in rank statistics). The OP seemed to want an "exact" explanation of the wiskers, and I think Peter has pointed us at the definition of quartiles used by fivenum, as opposed to the default used with quantile(..., "type=7"). All that said, I'm not convinced that it is wrong to speak of "interquartile range" in 'boxplot' help. Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Hi Peter, You're absolutely correct! The description for 'range' in 'boxplot' help file is a little bit confusing by using the words "interquartile range". I think it should be changed to the "length of the box" to be exact and consistent with those in the help file for "boxplot.stats". ...Tao - Original Message > From: Peter Ehlers > To: "Shi, Tao" > Cc: Jason Rupert ; Dennis Murphy ; > R Project Help ; murdoch.dun...@gmail.com > Sent: Wed, May 12, 2010 2:11:24 PM > Subject: Re: [R] Whiskers on the default boxplot {graphics} > > On 2010-05-12 13:27, Shi, Tao wrote: > Jason, > > All these > are clearly defined in the help file for 'boxplot' under 'range'. Don't > understand how you missed that. > > ...Tao > You've > made me re-read the help page for boxplot. I notice that there's a difference > in the description of 'range' on that page and the description of the > equivalent 'coef' on the help page for boxplot.stats. boxplot.stats has it > right. This should be made consistent. [previous posts > snipped] -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
On 2010-05-12 13:27, Shi, Tao wrote: Jason, All these are clearly defined in the help file for 'boxplot' under 'range'. Don't understand how you missed that. ...Tao You've made me re-read the help page for boxplot. I notice that there's a difference in the description of 'range' on that page and the description of the equivalent 'coef' on the help page for boxplot.stats. boxplot.stats has it right. This should be made consistent. [previous posts snipped] -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Jason, All these are clearly defined in the help file for 'boxplot' under 'range'. Don't understand how you missed that. ...Tao - Original Message > From: Jason Rupert > To: Dennis Murphy > Cc: R Project Help > Sent: Wed, May 12, 2010 3:40:12 AM > Subject: Re: [R] Whiskers on the default boxplot {graphics} > > Fantastic! It would be great if the description could be modified to > include the mysterious bit about the upper and lower bound whisker > positions: upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower whisker > = max(min(x), Q_1 - 1.5 * IQR) Maybe that is clearly written in the > description of boxplot.stats {grDevices}, but evidently I missed it numerous > times and also did not pick up on this intent from the original description > of > boxplot {graphics}. Your type of descriptive answer and > helpfulness is much appreciated and one of the reasons I continue to endorse > the > R tool over numerous others. More like you and the tool may be > headed for domination in the market. Thanks > again! From: > Dennis Murphy < > href="mailto:djmu...@gmail.com";>djmu...@gmail.com> Cc: R Project > Help < > href="mailto:R-help@r-project.org";>R-help@r-project.org> Sent: Wed, > May 12, 2010 2:50:19 AM Subject: Re: [R] Whiskers on the default boxplot > {graphics} Hi: Let's do some math > :) e: Okay...Let me see if I've got > it... > >>I'm just trying to use the default boxplot {graphics} > capability in R... > >>So I call something like the > following: >>> boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", > xlab="Number of Cylinders", ylab="Miles Per Gallon") \ > >>That > produces something as shown in the > following: >http://www.statmethods.net/graphs/images/boxplot1.jpg > >>When > that default boxplot is called, i.e. boxplot {graphics}, as shown in the line > of > code above, it is actually calling into boxplot.stats {grDevices}. When > boxplot.stats {grDevices} is called it has a default value for "coef" of 1.5, > i.e. coef = 1.5. > >>If I understand the purpose of "coef" > correctly, it means that the ‘whiskers’ should extend out 1.5 times the > length > of the box away from the box. Is that correct? > If by > 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 where Q > refers to quartile), then assuming that x is the numeric vector of interest > for a boxplot, upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower > whisker = max(min(x), Q_1 - 1.5 * IQR) So the upper whisker is located at > the *smaller* of the maximum x value and Q_3 + 1.5 IQR, whereas the lower > whisker is located at the *larger* of the smallest x value and Q_1 - 1.5 > IQR. In your terms, the whiskers should extend out a *maximum* of "1.5 > times the length of the box away from the box". Visually, this means > that individual points more extreme in value than Q3 + 1.5 IQR are > plotted separately at the high end, and those below Q1 - 1.5 IQR are plotted > separately on the low end. Depending on the source, the separately plotted > points are called 'outside values'. On the other hand, if the maximum or > minimum values of x are closer than 1.5 IQR in distance from its nearest > quartile, then that is where the whisker is positioned. Does that make > sense? HTH, Dennis >>Now I look back at the plot, and > I'm not sure how 1.5 times the length of the box corresponds with the whisker > lengths shown in the image: > > href="http://www.statmethods.net/graphs/images/boxplot1.jpg"; target=_blank > >http://www.statmethods.net/graphs/images/boxplot1.jpg > >>Is > it that the whisker length is a total of 1.5 the length of the box and > centered > about the median (2nd Quartile)? > >>Just trying to get a handle > on this, so thanks again for all the help in deciphering > this. > > > > > > > >> >>From: > RJ Cunningham < > href="mailto:ro...@iinet.net.au";>ro...@iinet.net.au> > > > target="_blank" href="http://ast.net";>ast.net> >>Cc: R Project > Help < > href="mailto:R-help@r-project.org";>R-help@r-project.org> >>Sent: > Tue, May 11, 2010 9:57:48 PM > >Subject: Re: [R] Whiskers on the > default boxplot {graphics} > > >I think not. Isn't the > "secret" here? > > >>Arguments: > >>x: a > numeric vector fo
Re: [R] Whiskers on the default boxplot {graphics}
On 2010-05-12 10:51, Robert Baer wrote: - Original Message - Fantastic! It would be great if the description could be modified to include the mysterious bit about the upper and lower bound whisker positions: upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower whisker = max(min(x), Q_1 - 1.5 * IQR) -- snip -- -- NOT quite! The boxplot.stats help reads under the coef argument: "... the whiskers extend to the most extreme data point which is no more than coef times the length of the box away from the box." If there are outliers, and the most extreme data point within 1.5 *IQR of Q1 or Q3 is less than 1.5 IQRs, and the wisker may "end earlier" than 1.5*IQR, but the data point at which it ends may NOT be max(x) or min(x). But even this is not quite correct. The help page (quoted above) is, as is so often the case, quite precise: the *length of the box* is multiplied by 1.5, not the *IQR*. The difference is probably insignificant in most applications, but then this question was about the precise definition of the whiskers. The box length is defined by the hinges, for whose definition it's probably easiest to look at the code in fivenum() which is used by boxplot.stats(). (The relevant code consists of three short lines.) For the calculation of the whisker extremes, one can peruse the boxplot.stats() code, which also is quite brief. Essentially, it determines which observations lie outside the boundaries established by (lower hinge - 1.5 * boxlength) and (upper hinge + 1.5 * boxlength) and then uses the range of the remaining data values to determine the whisker extremes. (I've assumed the default value of coef=1.5). Here's an example: set.seed(1) y <- rexp(30, .02) y <- sort(round(y)) fivenum(y) #[1] 3 22 38 61 221 boxplot.stats(y)$stats #[1] 3 22 38 61 118 # The hinges are 22, 61; # The whisker extremes are 3, 118; quantile(y, c(1,3)/4) # 25% 75% #23.25 60.50 # The hinges do not equal the quartiles. # Upper cut-off ('fence'): 61 + 1.5 * (61 - 22) #[1] 119.5 tail(y) #[1] 70 94 118 145 198 221 # So 118 is the largest data value less than or equal to 119.5. 60.5 + 1.5 * IQR(y) #[1] 116.375 # Using quartiles and the IQR would take the upper whisker to 94. -- Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Hi: Point well taken, Robert. This is a good example of the difference between how something is defined mathematically as opposed to how it is applied computationally. Thank you for the clarification. Regards, Dennis On Wed, May 12, 2010 at 9:51 AM, Robert Baer wrote: > > - Original Message - Fantastic! > > > It would be great if the description could be modified to include the > mysterious bit about the upper and lower bound whisker positions: > > upper whisker = min(max(x), Q_3 + 1.5 * IQR) > lower whisker = max(min(x), Q_1 - 1.5 * IQR) > > -- snip -- > -- > NOT quite! > > The boxplot.stats help reads under the coef argument: > "... the whiskers extend to the most extreme data point which is no more > than coef times the length of the box away from the box." > > > If there are outliers, and the most extreme data point within 1.5 *IQR of > Q1 or Q3 is less than 1.5 IQRs, and the wisker may "end earlier" than > 1.5*IQR, but the data point at which it ends may NOT be max(x) or min(x). > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
- Original Message - Fantastic! It would be great if the description could be modified to include the mysterious bit about the upper and lower bound whisker positions: upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower whisker = max(min(x), Q_1 - 1.5 * IQR) -- snip -- -- NOT quite! The boxplot.stats help reads under the coef argument: "... the whiskers extend to the most extreme data point which is no more than coef times the length of the box away from the box." If there are outliers, and the most extreme data point within 1.5 *IQR of Q1 or Q3 is less than 1.5 IQRs, and the wisker may "end earlier" than 1.5*IQR, but the data point at which it ends may NOT be max(x) or min(x). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Fantastic! It would be great if the description could be modified to include the mysterious bit about the upper and lower bound whisker positions: upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower whisker = max(min(x), Q_1 - 1.5 * IQR) Maybe that is clearly written in the description of boxplot.stats {grDevices}, but evidently I missed it numerous times and also did not pick up on this intent from the original description of boxplot {graphics}. Your type of descriptive answer and helpfulness is much appreciated and one of the reasons I continue to endorse the R tool over numerous others. More like you and the tool may be headed for domination in the market. Thanks again! From: Dennis Murphy Cc: R Project Help Sent: Wed, May 12, 2010 2:50:19 AM Subject: Re: [R] Whiskers on the default boxplot {graphics} Hi: Let's do some math :) e: Okay...Let me see if I've got it... > >>I'm just trying to use the default boxplot {graphics} capability in R... > >>So I call something like the following: >>> boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of >>> Cylinders", ylab="Miles Per Gallon") \ > >>That produces something as shown in the following: >http://www.statmethods.net/graphs/images/boxplot1.jpg > >>When that default boxplot is called, i.e. boxplot {graphics}, as shown in the >>line of code above, it is actually calling into boxplot.stats {grDevices}. >>When boxplot.stats {grDevices} is called it has a default value for "coef" of >>1.5, i.e. coef = 1.5. > >>If I understand the purpose of "coef" correctly, it means that the >>âwhiskersâ should extend out 1.5 times the length of the box away from >>the box. Is that correct? > If by 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 where Q refers to quartile), then assuming that x is the numeric vector of interest for a boxplot, upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower whisker = max(min(x), Q_1 - 1.5 * IQR) So the upper whisker is located at the *smaller* of the maximum x value and Q_3 + 1.5 IQR, whereas the lower whisker is located at the *larger* of the smallest x value and Q_1 - 1.5 IQR. In your terms, the whiskers should extend out a *maximum* of "1.5 times the length of the box away from the box". Visually, this means that individual points more extreme in value than Q3 + 1.5 IQR are plotted separately at the high end, and those below Q1 - 1.5 IQR are plotted separately on the low end. Depending on the source, the separately plotted points are called 'outside values'. On the other hand, if the maximum or minimum values of x are closer than 1.5 IQR in distance from its nearest quartile, then that is where the whisker is positioned. Does that make sense? HTH, Dennis >>Now I look back at the plot, and I'm not sure how 1.5 times the length of the >>box corresponds with the whisker lengths shown in the image: >http://www.statmethods.net/graphs/images/boxplot1.jpg > >>Is it that the whisker length is a total of 1.5 the length of the box and >>centered about the median (2nd Quartile)? > >>Just trying to get a handle on this, so thanks again for all the help in >>deciphering this. > > > > > > > >> >>From: RJ Cunningham > >ast.net> >>Cc: R Project Help >>Sent: Tue, May 11, 2010 9:57:48 PM > >Subject: Re: [R] Whiskers on the default boxplot {graphics} > > >I think not. Isn't the "secret" here? > > >>Arguments: > >>x: a numeric vector for which the boxplot will be constructed >>('NA's and 'NaN's are allowed and omitted). > >>coef: this determines how far the plot 'whiskers' extend out >>from the box. If 'coef' is positive, the whiskers extend >>to the most extreme data point which is no more than >>'coef' times the length of the box away from the box. A >>value of zero causes the whiskers to extend to the data >>extremes (and no outliers be returned). > >>do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' >>component respectively will be empty in the result. > >>Details: > >>The two 'hinges' are versions of the first and third quartile,... > > >>On Wed May 12 10:35 , Jason Rupert sent: > > >>HummMaybe I need to look some place else than boxplot.stats {grDevices} >>for a definition of how the upper/lower whiskers are produced. >>> >>>> >>>By any chance are they "the lowest datum still within 1.5 IQR of t
Re: [R] Whiskers on the default boxplot {graphics}
Hi: Let's do some math :) On Tue, May 11, 2010 at 8:55 PM, Jason Rupert wrote: > Okay...Let me see if I've got it... > > I'm just trying to use the default boxplot {graphics} capability in R... > > So I call something like the following: > > boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of > Cylinders", ylab="Miles Per Gallon") \ > > That produces something as shown in the following: > http://www.statmethods.net/graphs/images/boxplot1.jpg > > When that default boxplot is called, i.e. boxplot {graphics}, as shown in > the line of code above, it is actually calling into boxplot.stats > {grDevices}. When boxplot.stats {grDevices} is called it has a default > value for "coef" of 1.5, i.e. coef = 1.5. > > If I understand the purpose of "coef" correctly, it means that the > whiskers should extend out 1.5 times the length of the box away from the > box. Is that correct? > If by 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 where Q refers to quartile), then assuming that x is the numeric vector of interest for a boxplot, upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower whisker = max(min(x), Q_1 - 1.5 * IQR) So the upper whisker is located at the *smaller* of the maximum x value and Q_3 + 1.5 IQR, whereas the lower whisker is located at the *larger* of the smallest x value and Q_1 - 1.5 IQR. In your terms, the whiskers should extend out a *maximum* of "1.5 times the length of the box away from the box". Visually, this means that individual points more extreme in value than Q3 + 1.5 IQR are plotted separately at the high end, and those below Q1 - 1.5 IQR are plotted separately on the low end. Depending on the source, the separately plotted points are called 'outside values'. On the other hand, if the maximum or minimum values of x are closer than 1.5 IQR in distance from its nearest quartile, then that is where the whisker is positioned. Does that make sense? HTH, Dennis > > Now I look back at the plot, and I'm not sure how 1.5 times the length of > the box corresponds with the whisker lengths shown in the image: > http://www.statmethods.net/graphs/images/boxplot1.jpg > > Is it that the whisker length is a total of 1.5 the length of the box and > centered about the median (2nd Quartile)? > > Just trying to get a handle on this, so thanks again for all the help in > deciphering this. > > > > > > > > > From: RJ Cunningham > > ast.net> > Cc: R Project Help > Sent: Tue, May 11, 2010 9:57:48 PM > Subject: Re: [R] Whiskers on the default boxplot {graphics} > > I think not. Isn't the "secret" here? > > > Arguments: > > x: a numeric vector for which the boxplot will be constructed > ('NA's and 'NaN's are allowed and omitted). > > coef: this determines how far the plot 'whiskers' extend out > from the box. If 'coef' is positive, the whiskers extend > to the most extreme data point which is no more than > 'coef' times the length of the box away from the box. A > value of zero causes the whiskers to extend to the data > extremes (and no outliers be returned). > > do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' > component respectively will be empty in the result. > > Details: > > The two 'hinges' are versions of the first and third quartile,... > > > On Wed May 12 10:35 , Jason Rupert sent: > > > HummMaybe I need to look some place else than boxplot.stats {grDevices} > for a definition of how the upper/lower whiskers are produced. > > > >> > >By any chance are they "the lowest datum still within 1.5 IQR of the lower > quartile, and the highest datum still within 1.5 IQR of the upper quartile"? > > > >> > >None of the links from boxplot.stats {grDevices} seemed to reveal the > secret definition of the R whiskers. > > > >> > >Thanks again. > > > > > > > > > > > >> > >- Original Message > >> > > >> > >To: David Winsemius > >> > >Cc: R Project Help > >> > >Sent: Tue, May 11, 2010 9:26:25 PM > >> > >Subject: Re: [R] Whiskers on the default boxplot {graphics} > > > >> > >Wowzers... > > > >> > >From ?boxplot.stats: > > > >> > >Details > > > >> > >The two hinges are versions of the first and third quartile, i.e., close > to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n > <-
Re: [R] Whiskers on the default boxplot {graphics}
On May 11, 2010, at 11:55 PM, Jason Rupert wrote: Okay...Let me see if I've got it... I'm just trying to use the default boxplot {graphics} capability in R... So I call something like the following: boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon") \ That produces something as shown in the following: http://www.statmethods.net/graphs/images/boxplot1.jpg When that default boxplot is called, i.e. boxplot {graphics}, as shown in the line of code above, it is actually calling into boxplot.stats {grDevices}. When boxplot.stats {grDevices} is called it has a default value for "coef" of 1.5, i.e. coef = 1.5. If I understand the purpose of "coef" correctly, it means that the ‘whiskers’ should extend out 1.5 times the length of the box away from the box. Is that correct? No. Read it again. -- David. Now I look back at the plot, and I'm not sure how 1.5 times the length of the box corresponds with the whisker lengths shown in the image: http://www.statmethods.net/graphs/images/boxplot1.jpg Is it that the whisker length is a total of 1.5 the length of the box and centered about the median (2nd Quartile)? Just trying to get a handle on this, so thanks again for all the help in deciphering this. From: RJ Cunningham ast.net> Cc: R Project Help Sent: Tue, May 11, 2010 9:57:48 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} I think not. Isn't the "secret" here? Arguments: x: a numeric vector for which the boxplot will be constructed ('NA's and 'NaN's are allowed and omitted). coef: this determines how far the plot 'whiskers' extend out from the box. If 'coef' is positive, the whiskers extend to the most extreme data point which is no more than 'coef' times the length of the box away from the box. A value of zero causes the whiskers to extend to the data extremes (and no outliers be returned). do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' component respectively will be empty in the result. Details: The two 'hinges' are versions of the first and third quartile,... On Wed May 12 10:35 , Jason Rupert sent: HummMaybe I need to look some place else than boxplot.stats {grDevices} for a definition of how the upper/lower whiskers are produced. By any chance are they "the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile"? None of the links from boxplot.stats {grDevices} seemed to reveal the secret definition of the R whiskers. Thanks again. - Original Message To: David Winsemius Cc: R Project Help Sent: Tue, May 11, 2010 9:26:25 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} Wowzers... From ?boxplot.stats: Details The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise. The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians. Is a notch equal to the upper/lower whisker? Is this just a difference of terminology or something? Thanks again for all the insights. - Original Message From: David Winsemius Cc: R Project Help Sent: Tue, May 11, 2010 9:00:15 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} On May 11, 2010, at 9:45 PM, Jason Rupert wrote: How are the lower/upper whiskers defined in the default version of boxplot {graphics}? I tried help(boxplot) and searching www.rseek.org, but I was unable to determine an absolute answer. You need to follow the links from the help pages and tin this case it appears that you did not follow the one to ?boxplot.stats I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot \), but it also had several approaches listed for how the whiskers could be determined, so I'm just curious how the default boxplot {graphics} does it. Th
Re: [R] Whiskers on the default boxplot {graphics}
On May 11, 2010, at 10:35 PM, Jason Rupert wrote: HummMaybe I need to look some place else than boxplot.stats {grDevices} for a definition of how the upper/lower whiskers are produced. By any chance are they "the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile"? None of the links from boxplot.stats {grDevices} seemed to reveal the secret definition of the R whiskers. You didn't need to go to any other pages. You just needed to read boxplot.stats ... apparently more than once. -- David. Thanks again. - Original Message From: Jason Rupert To: David Winsemius Cc: R Project Help Sent: Tue, May 11, 2010 9:26:25 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} Wowzers... From ?boxplot.stats: Details The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise. The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians. Is a notch equal to the upper/lower whisker? Is this just a difference of terminology or something? Thanks again for all the insights. - Original Message From: David Winsemius To: Jason Rupert Cc: R Project Help Sent: Tue, May 11, 2010 9:00:15 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} On May 11, 2010, at 9:45 PM, Jason Rupert wrote: How are the lower/upper whiskers defined in the default version of boxplot {graphics}? I tried help(boxplot) and searching www.rseek.org, but I was unable to determine an absolute answer. You need to follow the links from the help pages and tin this case it appears that you did not follow the one to ?boxplot.stats I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot ), but it also had several approaches listed for how the whiskers could be determined, so I'm just curious how the default boxplot {graphics} does it. Thanks for any feedback Follow links with the R help system. and insights. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Okay...Let me see if I've got it... I'm just trying to use the default boxplot {graphics} capability in R... So I call something like the following: > boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of > Cylinders", ylab="Miles Per Gallon") \ That produces something as shown in the following: http://www.statmethods.net/graphs/images/boxplot1.jpg When that default boxplot is called, i.e. boxplot {graphics}, as shown in the line of code above, it is actually calling into boxplot.stats {grDevices}. When boxplot.stats {grDevices} is called it has a default value for "coef" of 1.5, i.e. coef = 1.5. If I understand the purpose of "coef" correctly, it means that the âwhiskersâ should extend out 1.5 times the length of the box away from the box. Is that correct? Now I look back at the plot, and I'm not sure how 1.5 times the length of the box corresponds with the whisker lengths shown in the image: http://www.statmethods.net/graphs/images/boxplot1.jpg Is it that the whisker length is a total of 1.5 the length of the box and centered about the median (2nd Quartile)? Just trying to get a handle on this, so thanks again for all the help in deciphering this. From: RJ Cunningham ast.net> Cc: R Project Help Sent: Tue, May 11, 2010 9:57:48 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} I think not. Isn't the "secret" here? Arguments: x: a numeric vector for which the boxplot will be constructed ('NA's and 'NaN's are allowed and omitted). coef: this determines how far the plot 'whiskers' extend out from the box. If 'coef' is positive, the whiskers extend to the most extreme data point which is no more than 'coef' times the length of the box away from the box. A value of zero causes the whiskers to extend to the data extremes (and no outliers be returned). do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' component respectively will be empty in the result. Details: The two 'hinges' are versions of the first and third quartile,... On Wed May 12 10:35 , Jason Rupert sent: HummMaybe I need to look some place else than boxplot.stats {grDevices} for a definition of how the upper/lower whiskers are produced. > >> >By any chance are they "the lowest datum still within 1.5 IQR of the lower >quartile, and the highest datum still within 1.5 IQR of the upper quartile"? > >> >None of the links from boxplot.stats {grDevices} seemed to reveal the secret >definition of the R whiskers. > >> >Thanks again. > > > > > >> >- Original Message >> >> >To: David Winsemius >> >Cc: R Project Help >> >Sent: Tue, May 11, 2010 9:26:25 PM >> >Subject: Re: [R] Whiskers on the default boxplot {graphics} > >> >Wowzers... > >> >From ?boxplot.stats: > >> >Details > >> >The two âhingesâ are versions of the first and third quartile, i.e., close >to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- >length(x)) and differ for even n. Whereas the quartiles only equal >observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for >n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise. > >> >The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be >based on the same calculations as the formula with 1.57 in Chambers et al. >(1983, p. 62), given in McGill et al. (1978, p. 16). They are based on >asymptotic normality of the median and roughly equal sample sizes for the two >medians being compared, and are said to be rather insensitive to the >underlying distributions of the samples. The idea appears to be to give >roughly a 95% confidence interval for the difference in two medians. > > > >> >Is a notch equal to the upper/lower whisker? Is this just a difference of >terminology or something? > >> >Thanks again for all the insights. > > > > >> >- Original Message >> >From: David Winsemius >> >> >Cc: R Project Help >> >Sent: Tue, May 11, 2010 9:00:15 PM >> >Subject: Re: [R] Whiskers on the default boxplot {graphics} > > >> >On May 11, 2010, at 9:45 PM, Jason Rupert wrote: > >> >> How are the lower/upper whiskers defined in the default version of boxplot >> {graphics}? >> >> >> >> I tried help(boxplot) and searching www.rseek.org, but I was unable to >> determine an absolute answer. > >> >You need to f
Re: [R] Whiskers on the default boxplot {graphics}
I think not. Isn't the "secret" here? Arguments: x: a numeric vector for which the boxplot will be constructed ('NA's and 'NaN's are allowed and omitted). coef: this determines how far the plot 'whiskers' extend out from the box. If 'coef' is positive, the whiskers extend to the most extreme data point which is no more than 'coef' times the length of the box away from the box. A value of zero causes the whiskers to extend to the data extremes (and no outliers be returned). do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' component respectively will be empty in the result. Details: The two 'hinges' are versions of the first and third quartile,... On Wed May 12 10:35 , Jason Rupert sent: HummMaybe I need to look some place else than boxplot.stats {grDevices} for a definition of how the upper/lower whiskers are produced. By any chance are they "the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile"? None of the links from boxplot.stats {grDevices} seemed to reveal the secret definition of the R whiskers. Thanks again. - Original Message From: Jason Rupert <[1]jasonkrup...@yahoo.com> To: David Winsemius <[2]dwinsem...@comcast.net> Cc: R Project Help <[3]r-h...@r-project.org> Sent: Tue, May 11, 2010 9:26:25 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} Wowzers... >From ?boxplot.stats: Details The two âhingesâ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise. The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians. Is a notch equal to the upper/lower whisker? Is this just a difference of terminology or something? Thanks again for all the insights. - Original Message From: David Winsemius <[4]dwinsem...@comcast.net> To: Jason Rupert <[5]jasonkrup...@yahoo.com> Cc: R Project Help <[6]r-h...@r-project.org> Sent: Tue, May 11, 2010 9:00:15 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} On May 11, 2010, at 9:45 PM, Jason Rupert wrote: > How are the lower/upper whiskers defined in the default version of boxplot {graphics}? > > I tried help(boxplot) and searching [7]www.rseek.org, but I was unable to determine an absolute answer. You need to follow the links from the help pages and tin this case it appears that you did not follow the one to ?boxplot.stats > > I checked out the definition of boxplot according to Wikipedia ([8]http://en.wikipedia.org/wiki/Box_plot\), but it also had several approaches > listed for how the whiskers could be determined, so I'm just curious how the default > boxplot {graphics} does it. > > Thanks for any feedback Follow links with the R help system. > and insights. David Winsemius, MD West Hartford, CT __ [9]r-h...@r-project.org mailing list [10]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [11]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [12]r-h...@r-project.org mailing list [13]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [14]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. References 1. javascript:top.opencompose%28%27jasonkrup...@yahoo.com%27,%27%27,%27%27,%27%27%29 2. javascript:top.opencompose%28%27dwinsem...@comcast.net%27,%27%27,%27%27,%27%27%29 3. javascript:top.opencompose%28%27r-h...@r-project.org%27,%27%27,%27%27,%27%27%29 4. javascript:top.opencompose%28%27dwinsem...@comcast.net%27,%27%27,%27%2
Re: [R] Whiskers on the default boxplot {graphics}
HummMaybe I need to look some place else than boxplot.stats {grDevices} for a definition of how the upper/lower whiskers are produced. By any chance are they "the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile"? None of the links from boxplot.stats {grDevices} seemed to reveal the secret definition of the R whiskers. Thanks again. - Original Message From: Jason Rupert To: David Winsemius Cc: R Project Help Sent: Tue, May 11, 2010 9:26:25 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} Wowzers... From ?boxplot.stats: Details The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise. The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians. Is a notch equal to the upper/lower whisker? Is this just a difference of terminology or something? Thanks again for all the insights. - Original Message From: David Winsemius To: Jason Rupert Cc: R Project Help Sent: Tue, May 11, 2010 9:00:15 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} On May 11, 2010, at 9:45 PM, Jason Rupert wrote: > How are the lower/upper whiskers defined in the default version of boxplot > {graphics}? > > I tried help(boxplot) and searching www.rseek.org, but I was unable to > determine an absolute answer. You need to follow the links from the help pages and tin this case it appears that you did not follow the one to ?boxplot.stats > > I checked out the definition of boxplot according to Wikipedia > (http://en.wikipedia.org/wiki/Box_plot), but it also had several approaches > listed for how the whiskers could be determined, so I'm just curious how the > default > boxplot {graphics} does it. > > Thanks for any feedback Follow links with the R help system. > and insights. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
Wowzers... From ?boxplot.stats: Details The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise. The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians. Is a notch equal to the upper/lower whisker? Is this just a difference of terminology or something? Thanks again for all the insights. - Original Message From: David Winsemius To: Jason Rupert Cc: R Project Help Sent: Tue, May 11, 2010 9:00:15 PM Subject: Re: [R] Whiskers on the default boxplot {graphics} On May 11, 2010, at 9:45 PM, Jason Rupert wrote: > How are the lower/upper whiskers defined in the default version of boxplot > {graphics}? > > I tried help(boxplot) and searching www.rseek.org, but I was unable to > determine an absolute answer. You need to follow the links from the help pages and tin this case it appears that you did not follow the one to ?boxplot.stats > > I checked out the definition of boxplot according to Wikipedia > (http://en.wikipedia.org/wiki/Box_plot), but it also had several approaches > listed for how the whiskers could be determined, so I'm just curious how the > default > boxplot {graphics} does it. > > Thanks for any feedback Follow links with the R help system. > and insights. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Whiskers on the default boxplot {graphics}
On May 11, 2010, at 9:45 PM, Jason Rupert wrote: How are the lower/upper whiskers defined in the default version of boxplot {graphics}? I tried help(boxplot) and searching www.rseek.org, but I was unable to determine an absolute answer. You need to follow the links from the help pages and tin this case it appears that you did not follow the one to ?boxplot.stats I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot ), but it also had several approaches listed for how the whiskers could be determined, so I'm just curious how the default boxplot {graphics} does it. Thanks for any feedback Follow links with the R help system. and insights. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Whiskers on the default boxplot {graphics}
How are the lower/upper whiskers defined in the default version of boxplot {graphics}? I tried help(boxplot) and searching www.rseek.org, but I was unable to determine an absolute answer. I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot), but it also had several approaches listed for how the whiskers could be determined, so I'm just curious how the default boxplot {graphics} does it. Thanks for any feedback and insights. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.