Excel 2003's help for percentile just says it interpolates between the quantiles in the data: Array is the array or range of data that defines relative standing. K is the percentile value in the range 0..1, inclusive. If array is empty or contains more than 8,191 data points, PERCENTILE returns the #NUM! error value. If k is nonnumeric, PERCENTILE returns the #VALUE! error value. If k is < 0 or if k > 1, PERCENTILE returns the #NUM! error value. If k is not a multiple of 1/(n - 1), PERCENTILE interpolates to determine the value at the k-th percentile. so some experimenation is on order.
I found that the call to R's quantile gives a different result for each of the 9 documented values of the type argument: x<-c(1,1,2,3,3,5,8,8,9,10) quantile(x, probs=(0:8)/8, type=types[i]) E.g., sapply(1:9,function(type)quantile(x=x,probs=(0:8)/8,type=type)) type=1 type=2 type=3 type=4 type=5 type=6 type=7 type=8 type=9 0% 1 1 1 1.00 1.00 1.000 1.000 1.000000 1.00000 12.5% 1 1 1 1.00 1.00 1.000 1.125 1.000000 1.00000 25% 2 2 1 1.50 2.00 1.750 2.250 1.916667 1.93750 37.5% 3 3 3 2.75 3.00 3.000 3.000 3.000000 3.00000 50% 3 4 3 3.00 4.00 4.000 4.000 4.000000 4.00000 62.5% 8 8 5 5.75 7.25 7.625 6.875 7.375000 7.34375 75% 8 8 8 8.00 8.00 8.250 8.000 8.083333 8.06250 87.5% 9 9 9 8.75 9.25 9.625 8.875 9.375000 9.34375 100% 10 10 10 10.00 10.00 10.000 10.000 10.000000 10.00000 I entered the same x into Excel 2003 and used the formulae =percentile(A1:10,0), =percentile(A1:A10,.125), ..., =percentile(A1:A10,1) and got the results 1, 1.125, 2.25, 3, 4, 6.875, 8, 8.875, 10 This matches only R's type 7, the default. They also match S+'s default quantile calculation. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com ---------------------------- Ted Harding wrote: > On 04-Mar-09 16:56:14, Wacek Kusnierczyk wrote: > (Ted Harding) wrote: > <snip> >> So, with reference to your original question >> "Excel has percentile() function. R function quantile() does the >> same thing. Is there any significant difference btw percentile >> and quantile?" >> the answer is that they in effect give the same results, though >> differ with respect to how they are to be fed (quantile eats >> probabilities, percentile eats percentages). [Though (since I am >> not familiar with Excel) I cannot rule out that Excel's percentile() >> function also eats probabilities; in which case its name would be >> an example of sloppy nomenclature on Excel's part; which I cannot >> rule out on general grounds either]. > > i am not familiar enough with excel to prove or disprove what you say > above, but in general such claims should be grounded in the respective > documentations. > > there are a number of ways to compute empirical quantiles (see, e.g., > [1]), and it's possible that the one used by r's quantile by default > (see ?quantile) is not the one used by excel (where you probably have > no choice; help in oocalc does not specify the method, and i guess > that excel's does not either). > > have you actually confirmed that excel's percentile() does the same as > r's quantile() (modulo the scaling)? > vQ I have now googled around a bit. All references to the Excel percentile() function say that you feed it the fractional value corresponding to the percentage. So, for example, to get the 80-th percentile you would give it 0.8. Hence Excel should call it "quantile"! As to the algorithm, Wikipedia states the following (translated into R syntax): Many software packages, such as Microsoft Excel, use the following method recommended by NIST[4] to estimate the value, vp, of the pth percentile of an ascending ordered dataset containing N elements with values v[1],v[2],...,v[N]: n = (p/100)*(N-1) + 1 n is then split into its integer component, k and decimal component, d, such that n = k + d. If k = 1, then the value for that percentile, vp, is the first member of the ordered dataset, v[1]. If k = N, then the value for that percentile, vp, is the Nth member of the ordered dataset, v[N]. Otherwise, 1 < k < N and vp = v[k] + d*(v[k + 1] - v[k]). Note that the Wikipedia article uses the "%" interpretation of "p-th percentile", i.e. the point which is (p/100) of the way along the distribution. It looks as though R's quantile with type=4 might be the same, since it is explained as "linear interpolation of the empirical cdf", which is what the above description of Excel's method does. However, R's default type is 7, which is different. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 04-Mar-09 Time: 17:29:50 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.