[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029642#comment-14029642 ]
Venkatesha Murthy TS edited comment on MATH-1120 at 6/12/14 7:33 PM: --------------------------------------------------------------------- Attached is the new patch which has the following changes Firstly, i have verified in my cygwin environment that the following command for patching works. (Did svn revert first and then tried this command) $ patch -p0 -i ../../vmurthy/patch patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/PercentileTest.java patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/MedianTest.java patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Percentile.java patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Median.java Next, the following are the responses for clarifications asked in the email. > IIUC, in this reference > http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html > what you called "EstimationTechnique" is referred to as "Type". > > Then the R manual uses a numbering: 1 to 9. > Done and re-named the EstimationTechnique as Type. > > Was Commons Math's implementation none of those nine types? > Commons Math implementation comes very close to R_6(infact the index calculation is same) however it is the max and min limits as to when x1 and xN needs to be considered that would differ between CM and R_6..(I have put this in java doc of R_6) > I wouldn't name the CM's implementation DEFAULT (and the R's manual > refers to a paper that recommends "type 8"). > Renamed DEFAULT as CM and all others are named as R_1,R_2, etc.. However, By default the type i have set is CM due to the fact the existing behaviour should be provided witout setting any new configuration. I understand R_8 is recommended; however it may be too much to disrupt users expectaion/experience by setting R_8 to default than CM. Please let me know what you think. > If it's OK to keep a tight link to the R's description of the variants, > I'd suggest > > public enum Type { > CM, // instead of DEFAULT > R_1, > R_2, > R_3, > R_4, > R_5, > R_6, > R_7, > R_8, > R_9, > // TYPE_TEN ? > } > Agreed taken. Also not implented the type 10 as i didnt yet get a bench mark such as R for comparison. > R_9 is not implemented in the patch. Is it intended? > Then on the Wikipedia page there is an unnamed 10th variant, also > not implemented. > Well yes i didnt go about implementing all of them however initially. But; now, i have added all of them except 10th type (as per wikipedia) for which i dont have benchmark to compare > People knowledgeable in what should be expected from such a > functionality are most welcome to provide feedback... Gilles:Thanks so much for the comments. Every one Please let know I have added AtomicInteger (not b'cos of Threads)but as a mutable holder for Int (akin to INOUT parameter). Ididnt add the comment explicitly because i felt the variable name lengthHolder suggests this reason contextually; given that no other mutable holder convenience is available in jdk. Incrementor may be an option but its usage is for different purpose and felt atomicint was abetter choice than counter variable. In addition i have added in the javadoc about mathjax formulae on max and min limits along with index and estimate. was (Author: vmurthy): Attached is the new patch which has the following changes Firstly, i have verified in my cygwin environment that the following command for patching works. (Did svn revert first and then tried this command) $ patch -p0 -i ../../vmurthy/patch patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/PercentileTest.java patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/MedianTest.java patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Percentile.java patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Median.java Next, the following are the responses for clarifications asked in the email. > IIUC, in this reference > http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html > what you called "EstimationTechnique" is referred to as "Type". > > Then the R manual uses a numbering: 1 to 9. > Done and re-named the EstimationTechnique as Type. > > Was Commons Math's implementation none of those nine types? > Commons Math implementation comes very close to R_6(infact the index calculation is same) however it is the max and min limits as to when x1 and xN needs to be considered that would differ between CM and R_6..(I have put this in java doc of R_6) > I wouldn't name the CM's implementation DEFAULT (and the R's manual > refers to a paper that recommends "type 8"). > Renamed DEFAULT as CM and all others are named as R_1,R_2, etc.. However, By default the type i have set is CM due to the fact the existing behaviour should be provided witout setting any new configuration. I understand R_8 is recommended; however it may be too much to disrupt users expectaion/experience by setting R_8 to default than CM. Please let me know what you think. > If it's OK to keep a tight link to the R's description of the variants, > I'd suggest > > public enum Type { > CM, // instead of DEFAULT > R_1, > R_2, > R_3, > R_4, > R_5, > R_6, > R_7, > R_8, > R_9, > // TYPE_TEN ? > } > Agreed taken. Also not implented the type 10 as i didnt yet get a bench mark such as R for comparison. > R_9 is not implemented in the patch. Is it intended? > Then on the Wikipedia page there is an unnamed 10th variant, also > not implemented. > Well yes i didnt go about implementing all of them however initially. But; i have added all of them except 10th type. > People knowledgeable in what should be expected from such a > functionality are most welcome to provide feedback... Gilles:Thanks so much for the comments. Every one Please let know I have added AtomicInteger (not b'cos of Threads)but as a mutable holder for Int (akin to INOUT parameter). Ididnt add the comment explicitly because i felt the variable name lengthHolder suggests this reason contextually; given that no other mutable holder convenience is available in jdk. Incrementor may be an option but its usage is for different purpose and felt atomicint was abetter choice than counter variable. In addition i have added in the javadoc about mathjax formulae on max and min limits along with index and estimate. > Need Percentile computations that can be matched with standard spreadsheet > formula > ---------------------------------------------------------------------------------- > > Key: MATH-1120 > URL: https://issues.apache.org/jira/browse/MATH-1120 > Project: Commons Math > Issue Type: Improvement > Affects Versions: 3.2 > Reporter: Venkatesha Murthy TS > Labels: Percentile > Fix For: 4.0 > > Attachments: excel-percentile-patch, > percentile-with-estimation-patch, r-output.txt > > Original Estimate: 504h > Remaining Estimate: 504h > > The current Percentile implementation assumes and hard-codes the quantile pth > position as > p * (N+1)/100 and provides a kth selected value. > However if we need to verify compare/contrast with standard statistical tools > such as say MS Excel; it would be good to provide an extensible way of > morphing this selection of position than hard code. > For example in order to generate the percentile closely matching with MS > Excel the position required may be [p*(N-1)/100]+1. > Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)