[jira] [Updated] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Schalk W. Cronjé updated MATH-1131: --- Attachment: 1.txt Dataset that was used. Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
Schalk W. Cronjé created MATH-1131: -- Summary: Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Schalk W. Cronjé updated MATH-1131: --- Attachment: ReproduceKsIssue.java Example Java code to reproduce issue Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Schalk W. Cronjé updated MATH-1131: --- Attachment: ReproduceKsIssue.groovy Example Groovy code that also reproduces the issue Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043247#comment-14043247 ] Schalk W. Cronjé commented on MATH-1131: See the code examples for the specific _mean_ _stddev_ that was used. Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CODEC-187) Beider Morse Phonetic Matching producing incorrect tokens
[ https://issues.apache.org/jira/browse/CODEC-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043304#comment-14043304 ] michael tobias commented on CODEC-187: -- As I think I said previously I suspect the most used settings for BMPM are GENERIC, APPROX and auto-language. I am therefore concentrating on testing that. Another small bug found. The name Bendzin should produce tokens of binzn bindzn vindzn bintsn vintsn but is missing the final one (vintsn): bindzn bintsn binzn vindzn Can this be investigated? Beider Morse Phonetic Matching producing incorrect tokens - Key: CODEC-187 URL: https://issues.apache.org/jira/browse/CODEC-187 Project: Commons Codec Issue Type: Bug Affects Versions: 1.9 Reporter: michael tobias Priority: Minor Fix For: 1.10 Attachments: CODEC-187.patch, CODEC-187_ashkenazi_approx_any.patch, CODEC-187_ashkenazi_approx_any_v2.patch I believe the Beider Morse Phonetic Matching algorithm was added in Commons Codec 1.6 The BMPM algorithm is an EVOLVING algorithm that is currently on version 3.02 though it had been static since version 3.01 dated 19 Dec 2011 (it was first available as opensource as version 1.00 on 6 May 2009). I can see nothing in the Commons Codec Docs to say which version of BMPM was implemented so I am not sure if the problem with the algorithm as coded in the Codec is simply an old version or whether there are more basic problems with the implementation. How do I determine the version of the algorithm that was implemented in the Commons Codec? How do we ensure that the algorithm is updated if/when the BMPM algorithm changes? How do we ensure that the algorithm as coded in the Commons Codec is accurate and working as expected? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MATH-1130) A new set of functions for copyof, remove and replace a given value on a slice of array
[ https://issues.apache.org/jira/browse/MATH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilles updated MATH-1130: - Attachment: equalsIncludingNaN.dat I've attached results of a micro-benchmark of equalsIncludingNaN. Your proposed change has better performance up to about 1.64e7 calls. Beyond that, the current code becomes more efficient. So, we have opposite arguments depending on usage. In the light usage range, where your code is indeed faster, the absolute time difference vary from 0.8 to 11 milliseconds (on my machine). A new set of functions for copyof, remove and replace a given value on a slice of array --- Key: MATH-1130 URL: https://issues.apache.org/jira/browse/MATH-1130 Project: Commons Math Issue Type: New Feature Affects Versions: 3.4 Reporter: Venkatesha Murthy TS Attachments: equalsIncludingNaN.dat, math-1130-checknotnan.patch, math-1130-precision-equals.patch, math-1130-remove.patch, math-1130-replace.patch, math-1130.patch These are utility functions mostly required as part of MathArrays. MathArrays: = The requirement is as follows: a) double[] copyOf(double[] values, int begin, int length) ; Similar to most other functions that support slice defined by the array part from [begin, begin+length) ;its a requirement to copy a slice which is not available (the closest is copyOf(array, int len) which misses out the begin index) b) double[] removeAll(double[] values, int begin, int length, double removable); Need a function to remove a value from array slice defined by [begin,begin+length) and return the filtered version. c) double[] replaceAll(double[] values, int begin, int length, double oldValue, double newValue); Need a function to replace inplace an oldValue substituted with newValue in the array slice defined by [begin,begin+length) and return the original complete array with just replaced values only in the segment [begin,begin+length) MathUtils = boolean canEqual(double d1, double d2) ; provide a canEqual function that is slightly better than exisitng MathUtils.equals. We could also improve existing equals method however. So the change here is that the new enhanced canEqual can do a quick check on Nans and then move to a detailed Double.compare(..) method. This avoids the Double.compare call when any one of them is NaN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (LOGGING-37) [logging] LogFactory#getLogFactory should not look for method every time
[ https://issues.apache.org/jira/browse/LOGGING-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043663#comment-14043663 ] Archie Cobbs commented on LOGGING-37: - I noticed this method consuming inordinate CPU on my system as well. Can the next release of commons-logging drop support for Java 1.1? If so then this issue can go away... simply invoke {{Thread.getContextClassLoader()}} directly. [logging] LogFactory#getLogFactory should not look for method every time Key: LOGGING-37 URL: https://issues.apache.org/jira/browse/LOGGING-37 Project: Commons Logging Issue Type: Bug Affects Versions: 1.0.4 Environment: Operating System: other Platform: Other Reporter: Matthias Ernst LogFactory checks for the existence of Thread#getContextClassLoader every time #getLogFactory is invoked and does a reflective invocation. This is unnecessarily expensive if many Log objects are created. An easy patch is to remember the Method object; the lookup happens only once and it will massively profit from reflection optimizations after a number of calls (a Java code stub is generated by the reflection package). Patch: 419a420,426 private static Method GET_CONTEXT_CLASS_LOADER = null; static { try { GET_CONTEXT_CLASS_LOADER = Thread.class.getMethod(getContextClassLoad er, null); } catch (NoSuchMethodException e) { } } 436,439c443 try { // Are we running on a JDK 1.2 or later system? Method method = Thread.class.getMethod(getContextClassLoader, nu ll); --- if(GET_CONTEXT_CLASS_LOADER != null) { 442c446 classLoader = (ClassLoader)method.invoke(Thread.currentThread( ), null); --- classLoader = (ClassLoader)GET_CONTEXT_CLASS_LOADER.invoke(Thr ead.currentThread(), null); 472c476 } catch (NoSuchMethodException e) { --- } else { -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043712#comment-14043712 ] Phil Steitz commented on MATH-1131: --- Thanks for reporting this and providing the code and data. I suspect the problem is in the matrix exponentiation done in the roundedK method. Anyone interested in patching this should start by looking at the reference in the class javadoc (and other sources) to identify optimizations that can be done for large samples. Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043857#comment-14043857 ] Thomas Neidhart commented on MATH-1131: --- I did briefly debug the example and indeed the calculation hangs when calling roundedK, or more precisely in createH. There powers of BigFraction objects are created with really big numerators and denominators. Some of the calculations later on take then forever because of this, e.g. when internally calculating the gcd. Looking at the implementation from the referenced paper, there the H values are computed with double precision. Was there a specific reason to use BigFraction in our implementation? Is there a specific need for that level of accuracy for the Kolmogorov-Smirnov Test? The other inference tests do not seem to be so stringent. It looks like there is no easy way to limit the maxDenominator when calling multiply() as it is possible when creating a BigFraction object. Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (LOGGING-37) [logging] LogFactory#getLogFactory should not look for method every time
[ https://issues.apache.org/jira/browse/LOGGING-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043881#comment-14043881 ] Thomas Neidhart commented on LOGGING-37: I think it is reasonable to make a 1.2 release that drops java 1.1 support. [logging] LogFactory#getLogFactory should not look for method every time Key: LOGGING-37 URL: https://issues.apache.org/jira/browse/LOGGING-37 Project: Commons Logging Issue Type: Bug Affects Versions: 1.0.4 Environment: Operating System: other Platform: Other Reporter: Matthias Ernst LogFactory checks for the existence of Thread#getContextClassLoader every time #getLogFactory is invoked and does a reflective invocation. This is unnecessarily expensive if many Log objects are created. An easy patch is to remember the Method object; the lookup happens only once and it will massively profit from reflection optimizations after a number of calls (a Java code stub is generated by the reflection package). Patch: 419a420,426 private static Method GET_CONTEXT_CLASS_LOADER = null; static { try { GET_CONTEXT_CLASS_LOADER = Thread.class.getMethod(getContextClassLoad er, null); } catch (NoSuchMethodException e) { } } 436,439c443 try { // Are we running on a JDK 1.2 or later system? Method method = Thread.class.getMethod(getContextClassLoader, nu ll); --- if(GET_CONTEXT_CLASS_LOADER != null) { 442c446 classLoader = (ClassLoader)method.invoke(Thread.currentThread( ), null); --- classLoader = (ClassLoader)GET_CONTEXT_CLASS_LOADER.invoke(Thr ead.currentThread(), null); 472c476 } catch (NoSuchMethodException e) { --- } else { -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (LOGGING-37) [logging] LogFactory#getLogFactory should not look for method every time
[ https://issues.apache.org/jira/browse/LOGGING-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Neidhart reopened LOGGING-37: [logging] LogFactory#getLogFactory should not look for method every time Key: LOGGING-37 URL: https://issues.apache.org/jira/browse/LOGGING-37 Project: Commons Logging Issue Type: Bug Affects Versions: 1.0.4 Environment: Operating System: other Platform: Other Reporter: Matthias Ernst LogFactory checks for the existence of Thread#getContextClassLoader every time #getLogFactory is invoked and does a reflective invocation. This is unnecessarily expensive if many Log objects are created. An easy patch is to remember the Method object; the lookup happens only once and it will massively profit from reflection optimizations after a number of calls (a Java code stub is generated by the reflection package). Patch: 419a420,426 private static Method GET_CONTEXT_CLASS_LOADER = null; static { try { GET_CONTEXT_CLASS_LOADER = Thread.class.getMethod(getContextClassLoad er, null); } catch (NoSuchMethodException e) { } } 436,439c443 try { // Are we running on a JDK 1.2 or later system? Method method = Thread.class.getMethod(getContextClassLoader, nu ll); --- if(GET_CONTEXT_CLASS_LOADER != null) { 442c446 classLoader = (ClassLoader)method.invoke(Thread.currentThread( ), null); --- classLoader = (ClassLoader)GET_CONTEXT_CLASS_LOADER.invoke(Thr ead.currentThread(), null); 472c476 } catch (NoSuchMethodException e) { --- } else { -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043902#comment-14043902 ] Schalk W. Cronjé commented on MATH-1131: This section of code in createH might be part of the problem. A quick test on my macbook shows that the most of 36 minutes are spent inside there for d=0.029357223978016822, n=. (I specifically tried 9,999 as it was one less than 10,000). {code:java} for (int i = 0; i m; ++i) { for (int j = 0; j i + 1; ++j) { if (i - j + 1 0) { for (int g = 2; g = i - j + 1; ++g) { Hdata[i][j] = Hdata[i][j].divide(g); } } } } {code} Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1130) A new set of functions for copyof, remove and replace a given value on a slice of array
[ https://issues.apache.org/jira/browse/MATH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043913#comment-14043913 ] Venkatesha Murthy TS commented on MATH-1130: Regarding equalsIncludingNaN: What is the test code used or the micro benchmark? can you please share ? What does N correspond to? In my measurements in my machine with respect to the test testMath1130ForDoubleEqual in PrecisionTest.java attached i j d time in seconds(old code) time in seconds(with venkats code) 10 10 1000.027 00.003 10 10 100 00.020 00.011 10 10 100000.038 00.023 10 100100000.066 00.073 10 1000 100000.287 00.127 10 1100002.280 00.947 10 11 22.217 09.275 10 110 224.454 91.918 10 10 1224.673 91.319 Well trying to understand a bit more. So as i am seeing most times the above timings... Wondering where am i seeing the gap as at every level (i,j,d) there is a clear difference from (10s of milliseconds to 100s of seconds)(bigger as the iterations go up) Basically If only one of them is NaN it does not make sense to get to a detailed compare which is what i have eliminated. Iam just following this tradition already existing in say MathArrays.equals(final float[] x, final float[] y) method where null checks are eliminated earlier in this fashion. Also if x!=x seems cryptic, i could replace with Double.isNaN() to make it bit obvious.(is this is the concern?) Please help me as to what is less clear A new set of functions for copyof, remove and replace a given value on a slice of array --- Key: MATH-1130 URL: https://issues.apache.org/jira/browse/MATH-1130 Project: Commons Math Issue Type: New Feature Affects Versions: 3.4 Reporter: Venkatesha Murthy TS Attachments: equalsIncludingNaN.dat, math-1130-checknotnan.patch, math-1130-precision-equals.patch, math-1130-remove.patch, math-1130-replace.patch, math-1130.patch These are utility functions mostly required as part of MathArrays. MathArrays: = The requirement is as follows: a) double[] copyOf(double[] values, int begin, int length) ; Similar to most other functions that support slice defined by the array part from [begin, begin+length) ;its a requirement to copy a slice which is not available (the closest is copyOf(array, int len) which misses out the begin index) b) double[] removeAll(double[] values, int begin, int length, double removable); Need a function to remove a value from array slice defined by [begin,begin+length) and return the filtered version. c) double[] replaceAll(double[] values, int begin, int length, double oldValue, double newValue); Need a function to replace inplace an oldValue substituted with newValue in the array slice defined by [begin,begin+length) and return the original complete array with just replaced values only in the segment [begin,begin+length) MathUtils = boolean canEqual(double d1, double d2) ; provide a canEqual function that is slightly better than exisitng MathUtils.equals. We could also improve existing equals method however. So the change here is that the new enhanced canEqual can do a quick check on Nans and then move to a detailed Double.compare(..) method. This avoids the Double.compare call when any one of them is NaN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OGNL-185) Performance issue on high load (thread blocking)
[ https://issues.apache.org/jira/browse/OGNL-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043996#comment-14043996 ] Brandon Ramirez commented on OGNL-185: -- Can anybody tell me if this is fixed or not? It was re-opened in October 2012, followed by a comment saying it can be closed. Is this the same as WW-3580? We had this issue in production today and need to know if upgrading will address this issue or not. Performance issue on high load (thread blocking) Key: OGNL-185 URL: https://issues.apache.org/jira/browse/OGNL-185 Project: Commons OGNL Issue Type: Bug Reporter: Christian Grobmeier Assignee: Maurizio Cucchiara Priority: Critical Attachments: thread-dump-lock.txt, thread-dump-lock2.txt The Struts project is suffering from an issue occuring from OGNL on heavy load. The issue in question (with details) is: https://issues.apache.org/jira/browse/WW-3580 A similar issues has been reported in the OpenSymphony bugtracker: https://issues.apache.org/jira/browse/WW-3580 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044019#comment-14044019 ] Schalk W. Cronjé commented on MATH-1131: [~p...@steitz.com]said on the ML: bq. Sorry for responding to the list but I have only mobile atm . IIRC the roundedK method should not be creating matrices of BigFractions, but rather using doubles. I did a quick hack on the test code I used for createH earlier to use double instead and the speed improvement as expected is immense - down from 36min to 9min. I cannot comment on whether the change in precision is significant, but not was not the point of the test. Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044019#comment-14044019 ] Schalk W. Cronjé edited comment on MATH-1131 at 6/25/14 8:44 PM: - [~p...@steitz.com] said on the ML: bq. Sorry for responding to the list but I have only mobile atm . IIRC the roundedK method should not be creating matrices of BigFractions, but rather using doubles. I did a quick hack on the test code I used for createH earlier to use double instead and the speed improvement as expected is immense - down from 36min to 9min. I cannot comment on whether the change in precision is significant, but not was not the point of the test. was (Author: ysb33r): [~p...@steitz.com]said on the ML: bq. Sorry for responding to the list but I have only mobile atm . IIRC the roundedK method should not be creating matrices of BigFractions, but rather using doubles. I did a quick hack on the test code I used for createH earlier to use double instead and the speed improvement as expected is immense - down from 36min to 9min. I cannot comment on whether the change in precision is significant, but not was not the point of the test. Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044029#comment-14044029 ] Thomas Neidhart commented on MATH-1131: --- Yes, I did the same test, and the unit test still pass successfully. The reason it still takes quite long is related to the input data: in your example you have 1 samples. To evaluate the result we need to calculate the pow of the calculated H matrix (~ 500x500) like this: {noformat} final RealMatrix Hpower = H.power(n); {noformat} Now, n is 1, which makes this a *very* expensive operation. I do not know if there is a reasonable approximation for this. Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
[ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044036#comment-14044036 ] Thomas Neidhart commented on MATH-1131: --- Ah ok, the paper at http://www.jstatsoft.org/v08/i18/ talks about a quick approximation in case n is very large, so we need to take a closer look at the attached source code how this part is handled (the mPower function). Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset --- Key: MATH-1131 URL: https://issues.apache.org/jira/browse/MATH-1131 Project: Commons Math Issue Type: Bug Affects Versions: 3.3 Environment: Java 8 Reporter: Schalk W. Cronjé Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java I have code simplified to the following: KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest(); NormalDistribution nd = new NormalDistribution(mean,stddev); kst.kolmogorovSmirnovTest(nd,dataset) I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'. It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-1130) A new set of functions for copyof, remove and replace a given value on a slice of array
[ https://issues.apache.org/jira/browse/MATH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044169#comment-14044169 ] Gilles commented on MATH-1130: -- bq. What is the test code used or the micro benchmark? It's in {{src/test/java/org/apache/commons/math3/PerfTestUtils.java}}. The code I used is {code} package org.apache.commons.math3.util; import org.apache.commons.math3.PerfTestUtils; import org.apache.commons.math3.distribution.RealDistribution; import org.apache.commons.math3.distribution.UniformRealDistribution; import org.junit.BeforeClass; import org.junit.Test; import org.junit.Assert; /** * Performance tests for Precision.equals. * Not enabled by default, as the class does not end in Test. * * Invoke by runningbr/ * {@code mvn test -Dtest=EqualsTestPerf}br/ * or by runningbr/ * {@code mvn test -Dtest=EqualsTestPerf -DargLine=-DtestRuns=1234 -server}br/ */ public class EqualsTestPerf { private static final int RUNS = Integer.parseInt(System.getProperty(testRuns,1000)); @Test public void testSimpleBenchmark() { final String D = CM; final String DM = V; final int numStat = 100; final int numCall = RUNS / numStat; final RealDistribution d = new UniformRealDistribution(0, Double.MAX_VALUE); final double v = d.sample(); final double w = d.sample(); PerfTestUtils.timeAndReport(equalsIncludingNaN, numCall, numStat, false, new PerfTestUtils.RunTest(D) { @Override public Double call() throws Exception { return Precision.equalsIncludingNaN(v, w) ? 1d : 0d; } }, new PerfTestUtils.RunTest(DM) { @Override public Double call() throws Exception { return EqualsTestPerf.equalsIncludingNaN(v, w) ? 1d : 0d; } }); } public static boolean equalsIncludingNaN(double x, double y) { return (x != x || y != y) ? !(x != x ^ y != y) : Precision.equals(x, y, 1); } } {code} bq. What does N correspond to? The value of the testRuns property. bq. Basically If only one of them is NaN it does not make sense to get to a detailed compare which is what i have eliminated. OK. I got it. In that case (only one argument is NaN), your code should be faster indeed. In the general case (none of the arguments is NaN), it's not that obvious, and the benchmarking code does not seem to help figuring it out. You are most welcome to try it (and report flaws...). A new set of functions for copyof, remove and replace a given value on a slice of array --- Key: MATH-1130 URL: https://issues.apache.org/jira/browse/MATH-1130 Project: Commons Math Issue Type: New Feature Affects Versions: 3.4 Reporter: Venkatesha Murthy TS Attachments: equalsIncludingNaN.dat, math-1130-checknotnan.patch, math-1130-precision-equals.patch, math-1130-remove.patch, math-1130-replace.patch, math-1130.patch These are utility functions mostly required as part of MathArrays. MathArrays: = The requirement is as follows: a) double[] copyOf(double[] values, int begin, int length) ; Similar to most other functions that support slice defined by the array part from [begin, begin+length) ;its a requirement to copy a slice which is not available (the closest is copyOf(array, int len) which misses out the begin index) b) double[] removeAll(double[] values, int begin, int length, double removable); Need a function to remove a value from array slice defined by [begin,begin+length) and return the filtered version. c) double[] replaceAll(double[] values, int begin, int length, double oldValue, double newValue); Need a function to replace inplace an oldValue substituted with newValue in the array slice defined by [begin,begin+length) and return the original complete array with just replaced values only in the segment [begin,begin+length) MathUtils = boolean canEqual(double d1, double d2) ; provide a canEqual function that is slightly better than exisitng MathUtils.equals. We could also improve existing equals method however. So the change here is that the new enhanced canEqual can do a quick check on Nans and then move to a detailed Double.compare(..)