[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luc Maisonobe updated MATH-1120: Fix Version/s: (was: 4.0) 3.4 Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 3.4 Attachments: 18-jun-percentile-with-estimation-patch, 27-jun-refactored-kth-pivoting.patch, excel-percentile-patch, math-1120-removeAndSlice.patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: math-1120-removeAndSlice.patch Hi This patch has the following changes: a) Small refactor in replaceAndSlice that makes a call to Precision.equalsIncludingNan that handles the NaN check effectively b) removeAndSlice slightly re-written to optimize on i) calling System.arraycopy - as now it goes by a bulk lengt of copy between the two occurances of removable item (as against one after another) ii) This is a correction: The last leg of copy checks correctly till begin + length (as against checking till length) Please let know. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: 18-jun-percentile-with-estimation-patch, 27-jun-refactored-kth-pivoting.patch, excel-percentile-patch, math-1120-removeAndSlice.patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: (was: 27-jun-refactored-pivot+nanchanges.patch) Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: 18-jun-percentile-with-estimation-patch, excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: 27-jun-refactored-kth-pivoting.patch Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: 18-jun-percentile-with-estimation-patch, 27-jun-refactored-kth-pivoting.patch, excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: 27-jun-refactored-pivot+nanchanges.patch Please find attached the changes for refactoring a) PivotingStrategyInterface and PivotingStrategy b) KthSelector c) Percentile changes for Pivoting, KthSelector refactoring In addition Please make sure to apply math-1132.patch before applying this patch as it has a dependency on nan changes done in MATH-1132. Please let me know. This is the latest changes as per discussion over dev mailing list on MATH-1120. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: 18-jun-percentile-with-estimation-patch, 27-jun-refactored-pivot+nanchanges.patch, excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: 18-jun-percentile-with-estimation-patch Hi Luc, Gilles First of, Iam immensly thankful to all your comments on this patch. Next, i am attaching my new patch with today's date(18-jun). However please advise if i need to remove the old patch file if it confuses. Please find my response below. The new patch has the suggested changes in the switch case for nan handling; But; However i have my view points on the different default nan strategies associated to types. Please permit me to explain (sorry for long summary) First, i would like to leave the Default implementation of Percentile as-is (Meaning in my MATH-1120 patch it is mapped to Type.CM)since otherwise we will break user old expectation even for non nan and non inf entries as well. The existing tests does fail if we change the default types (please refer to PercentileTest.java code as well to see the finer variations that is being looked at for different types) Secondly, Percentile.java header comment states somewhere to an effect that NaNs would be (left as-is and) handled by java's default sort behavior and no removal being done. So for me to map this behaviour to new implementation; it was NaNStrategy.FIXED that came close and didnt require any of the existing test cases for the existing Percentile behavior to change. What i am re-iterating here is the existing behavior tests have completely passed with new Type.CM and FIXED. (And now i have added several more tests including different types as well). Thirdly, While all the R_x (where x :[1-9]) types as run and verified by R tool; seemed to clearly convey the NaNs needed to be removed and hence you see that i have used different strategy NaNStrategy#REMOVED. I agree while multiple defaults are not wise to have ; however; if we are forced to have Apache CM as supported type (which is not one of R_x types) and we have the need to support multiple variants (R1- R9) ; then it is inevitable to have type sepcific NaNStrategy as per the need. I also feel ; NaN handling should be allowed for overriding atleast in a controlled manner as different use cases may exist for needing this variation in nan handling. Therefore IMO while we could avoid the public access to change these defaults; it is relevant to support these variations of nan handling on a per type and allow atleast sub classes to override if a rare need arises. While the very name NaNStrategy reminds me of different ways to look at that; i feel we will be much restricted if we just said that we stick to one way of NaNHandling for all types. Please let me know your thoughts. Next, Regarding the PivotingStrategy; At first, i wanted to convey here that to have all the partitioning, pivoting and selection in separated classes/enums than inside main class. I have made it as static due to the fact that; it is more of a non-functional requirement and felt that it need not be set for every instance (more of a global setting that doesnt vary across types). Please correct me here and let know if it still needs to be per instance. I also made it package accessable/settable solely because medianOf3 method had been package level for the sole intent of possible overriding of the same within that package. Meaning; if some one really needed to tinker around pivoting they need a way to do it which i have provided it using a strategy. Next, In the current patch that i am going to attach as new dated patch (since you have already started looking at the old one ; which i would leave it as is). I find many utility type methods; replaceNaN, removeNaN( Predicated Lists ) and copyOf(values, begin.length) and as well as KthSelector with PivotingStrategy etc all of which can perhaps make its way to MathArrays and MathUtils. Please let me know.If so i will once again re-factor these changes and submit the patch. Thanks for reading this through and for your time in reviewing . Please let me know your opinion on all of these. thanks venkat. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: 18-jun-percentile-with-estimation-patch, excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p *
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: (was: percentile-with-estimation-patch) Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: percentile-with-estimation-patch The preProcess method is removed and instead rolled in this function of pre-processing to getWorkArray method. The NaNs in the input array will now be handled by the NaNStrategy set at the time of Percentile construction. Please let me know. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: (was: r-output.txt) Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: r-output.txt percentile-with-estimation-patch Attached is the new patch which has the following changes Firstly, i have verified in my cygwin environment that the following command for patching works. (Did svn revert first and then tried this command) $ patch -p0 -i ../../vmurthy/patch patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/PercentileTest.java patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/MedianTest.java patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Percentile.java patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Median.java Next, the following are the responses for clarifications asked in the email. IIUC, in this reference http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html what you called EstimationTechnique is referred to as Type. Then the R manual uses a numbering: 1 to 9. Done and re-named the EstimationTechnique as Type. Was Commons Math's implementation none of those nine types? Commons Math implementation comes very close to R_6(infact the index calculation is same) however it is the max and min limits as to when x1 and xN needs to be considered that would differ between CM and R_6..(I have put this in java doc of R_6) I wouldn't name the CM's implementation DEFAULT (and the R's manual refers to a paper that recommends type 8). Renamed DEFAULT as CM and all others are named as R_1,R_2, etc.. However, By default the type i have set is CM due to the fact the existing behaviour should be provided witout setting any new configuration. I understand R_8 is recommended; however it may be too much to disrupt users expectaion/experience by setting R_8 to default than CM. Please let me know what you think. If it's OK to keep a tight link to the R's description of the variants, I'd suggest public enum Type { CM, // instead of DEFAULT R_1, R_2, R_3, R_4, R_5, R_6, R_7, R_8, R_9, // TYPE_TEN ? } Agreed taken. Also not implented the type 10 as i didnt yet get a bench mark such as R for comparison. R_9 is not implemented in the patch. Is it intended? Then on the Wikipedia page there is an unnamed 10th variant, also not implemented. Well yes i didnt go about implementing all of them however initially. But; i have added all of them except 10th type. People knowledgeable in what should be expected from such a functionality are most welcome to provide feedback... Gilles:Thanks so much for the comments. Every one Please let know I have added AtomicInteger (not b'cos of Threads)but as a holder for Int (akin to INOUT parameter). Ididnt add this exolicitly as i felt the variable name lengthHolder suggests the reason contextually. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: (was: percentile-with-estimation-patch) Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: percentile-with-estimation-patch Here is the latest patch with incorporated comments: a) removed the alternateNam b) Removed the comment Each enum has a MathJax comment about the formulaes c) To best extent i have corrected typos. d) Removed all references to R script e) No mixing of HTML and MathJax for a single formula . Though i would be interested to know the reasons here. f) variable named N is replaced with length for a more descriptive meaning. g) AddedSome final keywords h) Added href attribute values withindouble-quotes and tried for keepoing in one line. i) Will send an email to dev ML on this EstimationTechnique the best possible name? As it will be part of the public API, perhaps you could ask this question on the dev ML. j) Done on creating new constant for (0x1 MAX_CACHED_LEVELS) - 1 k) The medianOf3 method now carries a deprecation message to point to a estimation strategy setting and as well as the method now throws unsupported operation due to the fact that it is of no consequence if some one tried to overload the method.. Otherwise, the list of alternate percentile definitions seems a nice addition to the CM stat functionality. Thanks for the inputs. So now in summary a) As medianOf3 was exposed as package level access method; with the given change i am proposing to deprecate the same by trowing an unsupported operation. The reason being; the pivoring as a strategy has been added which can now be set if it really warrants. Again its access level is maintained as package level. b) Added PivotingStrategy enums such as randon and central privoting along with median of 3 approaches. Many thanks for all the comments Please let me know Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: r-output.txt Just attaching the r-output that i used for verification So basically i started with R1,R2,R3,R4,R7,R8 and DEFAULT estimations. However one of the test asserts with Multiple Positive infinities is not matching for R1,R2,R3,R4 where as it matches for R7,R8 and DEFAULT (which is apache commons). I am not clear on that still and looking at that. May need some help there. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch, r-output.txt Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: (was: percentile-with-estimation-patch) Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: percentile-with-estimation-patch Sorry for the inconvenience due to change in formatting.Modified the my IDE to not change format of a pre-existing code. Here is the latest drop of the same patch name (i replaced the earlier one with the new one). I also verified (in cygwin windows) that patch -p0 c:/workspaces/vmurthy/percentile-with-estimation-patch works saying patching the files Percentile and PercentileTest Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: (was: percentile-with-estimation-patch) Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: percentile-with-estimation-patch Cleared check style, pmd, findbugs and improved code coverage for the changed portions. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: (was: percentile-with-estimation-patch) Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: percentile-with-estimation-patch Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: percentile-with-estimation-patch As per earlier discussion ; was advised to take a look at the references for possible different types of computation and come up with a draft. Here is what i have been thinking There are atleast 9-10 documented approaches (from http://en.wikipedia.org/wiki/Quantile ) ofcomputing the percentile and the R statistical tool also has a reference implementation of these. All these strategies have provided formulaes for choice of the index of the array and an estimation technique to compute the estimation. These estimation tecniques can be turned in naturally as enum EstimationTechnique (R1, R2, etc. where R1,R2 are estimation types as elucidated in wikipedia) with the below funtions int index( double pthQuantile, int N); double estimate(double[] values, int[] pivotsHeap, double pos, int length) In addition the Percentile class already does amedian of 3 based pivoting for a kth selection. Since pivoting is again a strategy; we could go for a pivoting strategy enum along with defaults to median of 3. Further Kth Selection logic can now be sub sumed inside the EstimationTechnique as estimate method. Changes to Percentile: - Percentile has one or 2 more constructors to accommodate specifying EstimationTechnique during concstruction. The default estimation technique being the existing Percentile computation logic Which need not be specified and just the existing constructors willl work the same way as it used to be. Remove the Kth selection private methods and move them under KthSelector class (a separate nested class). However medianOf3 is exposed as package level access and hence needs to be refactored to use KthSelector class. It could also be deprecated as the method is not strictly with percentile logic (as much as Kthselection) Add 2 small methods to getWorkArray and Cached pivots that will need to be passed along to estimation tecnhique. I agree with removing/my earlier suggestion on ExcelPercentile{Test} and would like to look foward with opinions on the new approach. Please let know on the attached percentile-with-estimation-patch Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. I do have patch ready with small change needed in Percentile class and a new ExcelPercentile class written with tests closely matching with that of PercentileTest class. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Description: The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. was: The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. I do have patch ready with small change needed in Percentile class and a new ExcelPercentile class written with tests closely matching with that of PercentileTest class. Please let me know if i could submit this as a patch. Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch, percentile-with-estimation-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
[ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatesha Murthy TS updated MATH-1120: --- Attachment: excel-percentile-patch Added the first draft patch for review. Now that i have removed ExcelPtile as a new main class and since the position conversion priror to calling the percentile is not possible, this extension may be necessary. Please let me know Need Percentile computations that can be matched with standard spreadsheet formula -- Key: MATH-1120 URL: https://issues.apache.org/jira/browse/MATH-1120 Project: Commons Math Issue Type: Improvement Affects Versions: 3.2 Reporter: Venkatesha Murthy TS Labels: Percentile Fix For: 4.0 Attachments: excel-percentile-patch Original Estimate: 504h Remaining Estimate: 504h The current Percentile implementation assumes and hard-codes the quantile pth position as p * (N+1)/100 and provides a kth selected value. However if we need to verify compare/contrast with standard statistical tools such as say MS Excel; it would be good to provide an extensible way of morphing this selection of position than hard code. For example in order to generate the percentile closely matching with MS Excel the position required may be [p*(N-1)/100]+1. I do have patch ready with small change needed in Percentile class and a new ExcelPercentile class written with tests closely matching with that of PercentileTest class. Please let me know if i could submit this as a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)