[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Venkatesha Murthy TS updated MATH-1120: --------------------------------------- Attachment: percentile-with-estimation-patch As per earlier discussion ; was advised to take a look at the references for possible different types of computation and come up with a draft. Here is what i have been thinking There are atleast 9-10 documented approaches (from http://en.wikipedia.org/wiki/Quantile ) ofcomputing the percentile and the R statistical tool also has a reference implementation of these. All these strategies have provided formulaes for choice of the index of the array and an estimation technique to compute the estimation. These estimation tecniques can be turned in naturally as enum EstimationTechnique (R1, R2, etc. where R1,R2 are estimation types as elucidated in wikipedia) with the below funtions int index( double pthQuantile, int N); double estimate(double[] values, int[] pivotsHeap, double pos, int length) In addition the Percentile class already does amedian of 3 based pivoting for a kth selection. Since pivoting is again a strategy; we could go for a pivoting strategy enum along with defaults to median of 3. Further Kth Selection logic can now be sub sumed inside the EstimationTechnique as estimate method. Changes to Percentile: ----------------------------- Percentile has one or 2 more constructors to accommodate specifying EstimationTechnique during concstruction. The default estimation technique being the existing Percentile computation logic Which need not be specified and just the existing constructors willl work the same way as it used to be. Remove the Kth selection private methods and move them under KthSelector class (a separate nested class). However medianOf3 is exposed as package level access and hence needs to be refactored to use KthSelector class. It could also be deprecated as the method is not strictly with percentile logic (as much as Kthselection) Add 2 small methods to getWorkArray and Cached pivots that will need to be passed along to estimation tecnhique. I agree with removing/my earlier suggestion on ExcelPercentile{Test} and would like to look foward with opinions on the new approach. Please let know on the attached percentile-with-estimation-patch > Need Percentile computations that can be matched with standard spreadsheet > formula > ---------------------------------------------------------------------------------- > > Key: MATH-1120 > URL: https://issues.apache.org/jira/browse/MATH-1120 > Project: Commons Math > Issue Type: Improvement > Affects Versions: 3.2 > Reporter: Venkatesha Murthy TS > Labels: Percentile > Fix For: 4.0 > > Attachments: excel-percentile-patch, percentile-with-estimation-patch > > Original Estimate: 504h > Remaining Estimate: 504h > > The current Percentile implementation assumes and hard-codes the quantile pth > position as > p * (N+1)/100 and provides a kth selected value. > However if we need to verify compare/contrast with standard statistical tools > such as say MS Excel; it would be good to provide an extensible way of > morphing this selection of position than hard code. > For example in order to generate the percentile closely matching with MS > Excel the position required may be [p*(N-1)/100]+1. > I do have patch ready with small change needed in Percentile class and a new > ExcelPercentile class written with tests closely matching with that of > PercentileTest class. > Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)