[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula

Venkatesha Murthy TS (JIRA) Sun, 01 Jun 2014 13:22:23 -0700

     [ 
https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Venkatesha Murthy TS updated MATH-1120:
---------------------------------------

    Attachment: percentile-with-estimation-patch

As per earlier discussion ; was advised to take a look at the references for 
possible different types of computation and come up with a draft.

Here is what i have been thinking

There are atleast 9-10 documented approaches (from 
http://en.wikipedia.org/wiki/Quantile ) ofcomputing the percentile and the R 
statistical tool also has a reference implementation of these. All these 
strategies have provided formulaes for choice of the index of the array and an 
estimation technique to compute the estimation. 

These estimation tecniques can be turned in naturally as enum 
EstimationTechnique (R1, R2, etc. where R1,R2 are estimation types as 
elucidated in wikipedia) with the below funtions
int index( double pthQuantile, int N);
double estimate(double[] values, int[] pivotsHeap, double pos, int length)

In addition the Percentile class already does amedian of 3 based pivoting for a 
kth selection. Since pivoting is again a strategy; we could go for a pivoting 
strategy enum along with defaults to median of 3. Further Kth Selection logic 
can now be sub sumed inside the EstimationTechnique as estimate method.

Changes to Percentile:
-----------------------------
Percentile has one or 2 more constructors to accommodate specifying 
EstimationTechnique during concstruction. The default estimation technique 
being the existing Percentile computation logic Which need not be specified and 
just the existing constructors willl work the same way as it used to be.

Remove the Kth selection private methods and move them under KthSelector class 
(a separate nested class). However medianOf3 is exposed as package level access 
and hence needs to be refactored to use KthSelector class. It could also be 
deprecated as the method is not strictly with percentile logic (as much as 
Kthselection)
Add 2 small methods to getWorkArray and Cached pivots that will need to be 
passed along to estimation tecnhique.

I agree with removing/my earlier suggestion on ExcelPercentile{Test} and would 
like to look foward with opinions on the new approach.

Please let know on the attached percentile-with-estimation-patch




> Need Percentile computations that can be matched with standard spreadsheet 
> formula
> ----------------------------------------------------------------------------------
>
>                 Key: MATH-1120
>                 URL: https://issues.apache.org/jira/browse/MATH-1120
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.2
>            Reporter: Venkatesha Murthy TS
>              Labels: Percentile
>             Fix For: 4.0
>
>         Attachments: excel-percentile-patch, percentile-with-estimation-patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The current Percentile implementation assumes and hard-codes the quantile pth 
> position as 
> p * (N+1)/100 and provides a kth selected value.
> However if we need to verify compare/contrast with standard statistical tools 
> such as say MS Excel; it would be good to provide an extensible way of 
> morphing this selection of position than hard code.
> For example in order to generate the percentile closely matching with MS 
> Excel the position required may be [p*(N-1)/100]+1.
> I do have patch ready with small change needed in Percentile class and a new 
> ExcelPercentile class written with tests closely matching with that of 
> PercentileTest class.
> Please let me know if i could submit this as a patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula

Reply via email to