[ 
https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029642#comment-14029642
 ] 

Venkatesha Murthy TS edited comment on MATH-1120 at 6/12/14 7:33 PM:
---------------------------------------------------------------------

Attached is the new patch which has the following changes

Firstly, i have verified in my cygwin environment that the following command 
for patching works. (Did svn revert first and then tried this command)
$ patch -p0 -i ../../vmurthy/patch
patching file 
src/test/java/org/apache/commons/math3/stat/descriptive/rank/PercentileTest.java
patching file 
src/test/java/org/apache/commons/math3/stat/descriptive/rank/MedianTest.java
patching file 
src/main/java/org/apache/commons/math3/stat/descriptive/rank/Percentile.java
patching file 
src/main/java/org/apache/commons/math3/stat/descriptive/rank/Median.java

Next, the following are the responses for clarifications asked in the email.

> IIUC, in this reference
>   http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
> what you called "EstimationTechnique" is referred to as "Type".
>
> Then the R manual uses a numbering: 1 to 9.
>

Done and re-named the EstimationTechnique as Type.

>
> Was Commons Math's implementation none of those nine types?
>
Commons Math implementation comes very close to R_6(infact the index 
calculation is same) however it is the max and min limits
as to when x1 and xN needs to be considered that would differ  between CM and 
R_6..(I have put this in java doc of R_6)

> I wouldn't name the CM's implementation DEFAULT (and the R's manual
> refers to a paper that recommends "type 8").
>
Renamed DEFAULT as CM and all others are named as R_1,R_2, etc..
However, By default the type i have set is CM due to the fact the existing 
behaviour should be provided witout setting any new configuration. I understand 
R_8 is recommended; however it may be too much to disrupt users 
expectaion/experience by setting R_8 to default than CM. Please let me know 
what you think.

> If it's OK to keep a tight link to the R's description of the variants,
> I'd suggest
>
> public enum Type {
>   CM,  // instead of DEFAULT
>   R_1,
>   R_2,
>   R_3,
>   R_4,
>   R_5,
>   R_6,
>   R_7,
>   R_8,
>   R_9,
>   // TYPE_TEN ?
> }
>
Agreed taken. Also not implented the type 10 as i didnt yet get a bench mark 
such as R for comparison.


> R_9 is not implemented in the patch. Is it intended?
> Then on the Wikipedia page there is an unnamed 10th variant, also
> not implemented.
>
Well yes i didnt go about implementing all of them however initially. But; now,
i have added all of them except 10th type (as per wikipedia) for which i dont 
have benchmark to compare

> People knowledgeable in what should be expected from such a
> functionality are most welcome to provide feedback...
Gilles:Thanks so much for the comments. Every one Please let know

I have added AtomicInteger (not b'cos of Threads)but as a mutable holder for 
Int (akin to INOUT parameter). Ididnt add the comment explicitly because i felt 
the variable name lengthHolder suggests this reason contextually; given that no 
other mutable holder convenience is available in jdk. Incrementor may be an 
option but its usage is for different purpose and felt atomicint was abetter 
choice than counter variable.

In addition i have added in the javadoc about mathjax formulae on max and min 
limits along with index and estimate.






was (Author: vmurthy):
Attached is the new patch which has the following changes

Firstly, i have verified in my cygwin environment that the following command 
for patching works. (Did svn revert first and then tried this command)
$ patch -p0 -i ../../vmurthy/patch
patching file 
src/test/java/org/apache/commons/math3/stat/descriptive/rank/PercentileTest.java
patching file 
src/test/java/org/apache/commons/math3/stat/descriptive/rank/MedianTest.java
patching file 
src/main/java/org/apache/commons/math3/stat/descriptive/rank/Percentile.java
patching file 
src/main/java/org/apache/commons/math3/stat/descriptive/rank/Median.java

Next, the following are the responses for clarifications asked in the email.

> IIUC, in this reference
>   http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
> what you called "EstimationTechnique" is referred to as "Type".
>
> Then the R manual uses a numbering: 1 to 9.
>

Done and re-named the EstimationTechnique as Type.

>
> Was Commons Math's implementation none of those nine types?
>
Commons Math implementation comes very close to R_6(infact the index 
calculation is same) however it is the max and min limits
as to when x1 and xN needs to be considered that would differ  between CM and 
R_6..(I have put this in java doc of R_6)

> I wouldn't name the CM's implementation DEFAULT (and the R's manual
> refers to a paper that recommends "type 8").
>
Renamed DEFAULT as CM and all others are named as R_1,R_2, etc..
However, By default the type i have set is CM due to the fact the existing 
behaviour should be provided witout setting any new configuration. I understand 
R_8 is recommended; however it may be too much to disrupt users 
expectaion/experience by setting R_8 to default than CM. Please let me know 
what you think.

> If it's OK to keep a tight link to the R's description of the variants,
> I'd suggest
>
> public enum Type {
>   CM,  // instead of DEFAULT
>   R_1,
>   R_2,
>   R_3,
>   R_4,
>   R_5,
>   R_6,
>   R_7,
>   R_8,
>   R_9,
>   // TYPE_TEN ?
> }
>
Agreed taken. Also not implented the type 10 as i didnt yet get a bench mark 
such as R for comparison.


> R_9 is not implemented in the patch. Is it intended?
> Then on the Wikipedia page there is an unnamed 10th variant, also
> not implemented.
>
Well yes i didnt go about implementing all of them however initially. But;
i have added all of them except 10th type.

> People knowledgeable in what should be expected from such a
> functionality are most welcome to provide feedback...
Gilles:Thanks so much for the comments. Every one Please let know

I have added AtomicInteger (not b'cos of Threads)but as a mutable holder for 
Int (akin to INOUT parameter). Ididnt add the comment explicitly because i felt 
the variable name lengthHolder suggests this reason contextually; given that no 
other mutable holder convenience is available in jdk. Incrementor may be an 
option but its usage is for different purpose and felt atomicint was abetter 
choice than counter variable.

In addition i have added in the javadoc about mathjax formulae on max and min 
limits along with index and estimate.





> Need Percentile computations that can be matched with standard spreadsheet 
> formula
> ----------------------------------------------------------------------------------
>
>                 Key: MATH-1120
>                 URL: https://issues.apache.org/jira/browse/MATH-1120
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.2
>            Reporter: Venkatesha Murthy TS
>              Labels: Percentile
>             Fix For: 4.0
>
>         Attachments: excel-percentile-patch, 
> percentile-with-estimation-patch, r-output.txt
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The current Percentile implementation assumes and hard-codes the quantile pth 
> position as 
> p * (N+1)/100 and provides a kth selected value.
> However if we need to verify compare/contrast with standard statistical tools 
> such as say MS Excel; it would be good to provide an extensible way of 
> morphing this selection of position than hard code.
> For example in order to generate the percentile closely matching with MS 
> Excel the position required may be [p*(N-1)/100]+1.
> Please let me know if i could submit this as a patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to