This enables the user to produce quantiles that are equivalent to those in various statistics package. There is a type argument that allows one to choose between the various sample quantile methods. type=7 gives identical results to the current R function quantile.default(). In our revised function, type=8 is the default following the recommendation of Hyndman and Fan (American Statistician, 1996).
We suggest the attached function replaces the current quantile.default() function in R as it provides additional functionality without increasing computation time or affecting ease-of-use. If backwards-compatibility is important, we are happy to set type=7 as the default. However, we prefer type=8. For moderate to large sample sizes, the difference is negligble.
The code and associated documentation is as close as possible to what already exists.
Regards, Rob Hyndman
__________________________________________________ Professor Rob J Hyndman Department of Econometrics & Business Statistics Monash University, VIC 3800, Australia. http://www-personal.buseco.monash.edu.au/~hyndman/
quantile.default <- function(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
names = TRUE, type = 8, ...)
{
if (na.rm)
x <- x[!is.na(x)]
else if (any(is.na(x)))
stop("Missing values and NaN's not allowed if `na.rm' is FALSE")
if (any((p.ok <- !is.na(probs)) & (probs < 0 | probs > 1)))
stop("probs outside [0,1]")
if (na.p <- any(!p.ok)) {
o.pr <- probs
probs <- probs[p.ok]
}
type <- as.integer(type)
if (is.na(type) || (type < 1 | type > 9))
stop("type outside range [1,9]")
np <- length(probs)
n <- length(x)
if (n > 0 && np > 0) {
if (type <= 3) {
## Types 1, 2 and 3 are discontinuous sample qs.
if (type == 3)
nppm <- n * probs - .5 # n * probs + m; m = -0.5
else
nppm <- n * probs # m = 0
j <- floor(nppm)
switch(type,
h <- ifelse(nppm > j, 1, 0), # type 1
h <- ifelse(nppm > j, 1, 0.5), # type 2
h <- ifelse((nppm == j) &&
((j %% 2) == 0), 0, 1)) # type 3
}
else {
## Types 4 through 9 are continuous sample qs.
switch(type - 3,
{a <- 0; b <- 1}, # type 4
a <- b <- 0.5, # type 5
a <- b <- 0, # type 6
a <- b <- 1, # type 7
a <- b <- 1 / 3, # type 8
a <- b <- 3 / 8) # type 9
nppm <- a + probs * (n + 1 - a - b) # n*probs + m
j <- floor(nppm) # m = a + probs*(1 - a - b)
h <- nppm - j
}
x <- sort(x, partial = unique(c(1,j[j>0 & j<=n],(j+1)[j>0 & j<n],n)))
x <- c(x[1], x[1], x, x[n], x[n])
qs <- (1 - h) * x[j + 2] + h * x[j + 3]
}
else {
qs <- rep(as.numeric(NA), np)
}
if (names && np > 0) {
dig <- max(2, getOption("digits"))
names(qs) <- paste(if (np < 100)
formatC(100 * probs, format = "fg", wid = 1, digits = dig)
else format(100 * probs, trim = TRUE, digits = dig),
"%", sep = "")
}
if (na.p) {
o.pr[p.ok] <- qs
names(o.pr) <- rep("", length(o.pr))
names(o.pr)[p.ok] <- names(qs)
o.pr
}
else qs
}
\name{quantile}
\title{Sample Quantiles}
\alias{quantile}
\alias{quantile.default}
\description{
The generic function \code{quantile} produces sample quantiles corresponding to the
given probabilities. The smallest observation corresponds to a probability of
0 and the largest to a probability of 1.
}
\usage{
quantile(x, \dots)
\method{quantile}{default}(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
names = TRUE, type = 8, ...)
}
\arguments{
\item{x}{numeric vector whose sample quantiles are wanted.}
\item{probs}{numeric vector of probabilities with values in \eqn{[0,1]}{[0,1]}.}
\item{na.rm}{logical; if \code{TRUE} any \code{NA} or \code{NaN}
is removed from \code{x} before the quantiles are computed. If
\code{FALSE} the presence of \code{NA} or \code{NaN} in \code{x}
aborts the function.}
\item{names}{logical; if \code{TRUE}, the result has a \code{\link[base]{names}}
attribute.
Set to FALSE for efficiency when \code{probs} is long.}
\item{type}{an integer between 1 and 9 selecting one of the
nine quantile algorithms detailed below to be used.}
\item{\dots}{further arguments passed to or from other methods}
}
\value{
A vector of the same length as \code{probs} is returned;
if \code{names = TRUE}, it has a \code{\link[base]{names}} attribute.
\code{\link[base]{NA}} and \code{\link[base]{NaN}} values in \code{probs} are
propagated to the result.
}
\details{
\code{quantile} returns estimates of underlying distribution quantiles
based on one or two order statistics from the supplied elements in \code{x} at
probabilities in \code{probs}. One of the nine quantile algorithms discussed in
Hyndman and Fan (1996), selected by \code{type}, is employed.
Sample quantiles of type \eqn{i} are defined by
\deqn{Q_{i}(p) = (1 - \gamma)x_{j} + \gamma x_{j+1}}{Q[i](p) = (1 - gamma) x[j] +
gamma x[j+1],} where
\eqn{1 \le i \le 9}{1 <= i <= 9}, \eqn{ }
\eqn{\frac{j - m}{n} \le p < \frac{j - m + 1}{n}}{(j - m) / n <= p < (j - m + 1) / n},
\eqn{ }
\eqn{x_{j}}{x[j]} is the \eqn{j}th order statistic,
\eqn{n} is the sample size, and \eqn{m} is a constant determined by
the sample quantile type.
Here \eqn{\gamma}{gamma} depends on
\eqn{j = \lfloor np + m\rfloor}{j = floor(np + m),} and
\eqn{g = np + m - j.}
For the continuous sample quantile types (4 through 9), the sample quantiles can be
obtained
by linear interpolation between the \eqn{k}th order statistic and \eqn{p(k)}:
\deqn{p(k) = \frac{k - \alpha} {n - \alpha - \beta + 1}}{p(k) = (k - alpha) / (n -
alpha - beta + 1),}
where \eqn{\alpha}{alpha} and \eqn{\beta}{beta} are constants determined by the type.
Further,
\eqn{m = \alpha + p \left( 1 - \alpha - \beta \right)}{m = alpha + p(1 - alpha -
beta),}
and \eqn{\gamma = g = np + m - j}{gamma = g = np + m - j.}
\strong{Discontinuous sample quantile types 1, 2, and 3}
\describe{
\item{Type 1}{Inverse of empirical distribution function.}
\item{Type 2}{Similar to type 1 but with averaging at discontinuities.}
\item{Type 3}{SAS definition: nearest even order statistic.}
}
\strong{Continuous sample quantile types 4 through 9}
\describe{
\item{Type 4}{\eqn{p(k) = \frac{k}{n}}{p(k) = k / n}.
That is, linear interpolation of the empirical cdf.}
\item{Type 5}{\eqn{p(k) = \frac{k - 0.5}{n}}{p(k) = (k - 0.5) / n}.
That is a piecewise linear function where the knots are the values
midway through the steps of the empirical cdf. This is popular amongst
hydrologists.}
\item{Type 6}{\eqn{p(k) = \frac{k}{n + 1}}{p(k) = k / (n + 1)}.
Thus \eqn{p(k) = \mbox{E}[F(x_{k})]}{p(k) = E[F(x[k])]}. This is used by Minitab
and by SPSS.}
\item{Type 7}{\eqn{p(k) = \frac{k - 1}{n - 1}}{p(k) = (k - 1) / (n - 1)}.
In this case, \eqn{p(k) = \mbox{mode}[F(x_{k})]}{p(k) = mode[F(x[k])]}. This is
used by S-Plus.}
\item{Type 8}{\eqn{p(k) = \frac{k - \frac{1}{3}}{n + \frac{1}{3}}}{p(k) = (k - 1/3) /
(n + 1/3)}.
Then \eqn{p(k) \approx \mbox{median}[F(x_{k})]}{p(k) =~ median[F(x[k])]}.
The resulting quantile estimates are approximately median-unbiased regardless of
the
distribution of \code{x}.}
\item{Type 9}{\eqn{p(k) = \frac{k - \frac{3}{8}}{n + \frac{1}{4}}}{p(k) = (k - 3/8) /
(n + 1/4)}.
The resulting quantile estimates are approximately unbiased if \code{x} is
normally distributed.}
}
Hyndman and Fan (1996) recommend type 8 which is the default method for this function.
}
\references{
Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in
statistical packages, \emph{American Statistician}, \bold{50}, 361-365.
}
\seealso{
\code{\link[stats]{ecdf}} for empirical distributions of which quantile is
the \dQuote{inverse}; \code{\link[graphics]{boxplot.stats}} and
\code{\link[stats]{fivenum}}
for computing \dQuote{versions} of quartiles, etc.
}
\examples{
quantile(x <- rnorm(1001))# Extremes & Quartiles by default
### Compare different methods
p <- c(0.1,0.5,1,2,5,10,50)/100
quantile(x, p, type=1)
quantile(x, p, type=2)
quantile(x, p, type=3)
quantile(x, p, type=4)
quantile(x, p, type=5)
quantile(x, p, type=6)
quantile(x, p, type=7)
quantile(x, p, type=8)
quantile(x, p, type=9)
}
\keyword{univar}
\author{Ivan Frohne and Rob J Hyndman}
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
