[jira] [Updated] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Schalk W. Cronjé updated MATH-1131:
---

Attachment: 1.txt

Dataset that was used.

 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA
Schalk W. Cronjé created MATH-1131:
--

 Summary: Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item 
dataset
 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt

I have code simplified to the following:

KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
NormalDistribution nd = new NormalDistribution(mean,stddev);
kst.kolmogorovSmirnovTest(nd,dataset)

I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
takes 'forever'. It has not returned after nearly 15minutes and in one my my 
tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Schalk W. Cronjé updated MATH-1131:
---

Attachment: ReproduceKsIssue.java

Example Java code to reproduce issue

 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Schalk W. Cronjé updated MATH-1131:
---

Attachment: ReproduceKsIssue.groovy

Example Groovy code that also reproduces the issue

 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043247#comment-14043247
 ] 

Schalk W. Cronjé commented on MATH-1131:


See the code examples for the specific _mean_  _stddev_ that was used.

 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CODEC-187) Beider Morse Phonetic Matching producing incorrect tokens

2014-06-25 Thread michael tobias (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043304#comment-14043304
 ] 

michael tobias commented on CODEC-187:
--

As I think I said previously I suspect the most used settings for BMPM are 
GENERIC, APPROX and auto-language.  I am therefore concentrating on testing 
that.

Another small bug found.

The name Bendzin should produce tokens of 
binzn bindzn vindzn bintsn vintsn 

but is missing the final one (vintsn):
bindzn bintsn binzn vindzn

Can this be investigated?


 Beider Morse Phonetic Matching producing incorrect tokens
 -

 Key: CODEC-187
 URL: https://issues.apache.org/jira/browse/CODEC-187
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.9
Reporter: michael tobias
Priority: Minor
 Fix For: 1.10

 Attachments: CODEC-187.patch, CODEC-187_ashkenazi_approx_any.patch, 
 CODEC-187_ashkenazi_approx_any_v2.patch


 I believe the Beider Morse Phonetic Matching algorithm was added in Commons 
 Codec 1.6
 The BMPM algorithm is an EVOLVING algorithm that is currently on version 3.02 
 though it had been static since version 3.01 dated 19 Dec 2011 (it was first 
 available as opensource as version 1.00 on 6 May 2009).
 I can see nothing in the Commons Codec Docs to say which version of BMPM was 
 implemented so I am not sure if the problem with the algorithm as coded in 
 the Codec is simply an old version or whether there are more basic problems 
 with the implementation.
 How do I determine the version of the algorithm that was implemented in the 
 Commons Codec?
 How do we ensure that the algorithm is updated if/when the BMPM algorithm 
 changes?
 How do we ensure that the algorithm as coded in the Commons Codec is accurate 
 and working as expected?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MATH-1130) A new set of functions for copyof, remove and replace a given value on a slice of array

2014-06-25 Thread Gilles (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilles updated MATH-1130:
-

Attachment: equalsIncludingNaN.dat

I've attached results of a micro-benchmark of equalsIncludingNaN.

Your proposed change has better performance up to about 1.64e7 calls.
Beyond that, the current code becomes more efficient.

So, we have opposite arguments depending on usage.

In the light usage range, where your code is indeed faster, the absolute time 
difference vary from 0.8 to 11 milliseconds (on my machine).


 A new set of functions for copyof, remove and replace a given value on a 
 slice of array
 ---

 Key: MATH-1130
 URL: https://issues.apache.org/jira/browse/MATH-1130
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.4
Reporter: Venkatesha Murthy TS
 Attachments: equalsIncludingNaN.dat, math-1130-checknotnan.patch, 
 math-1130-precision-equals.patch, math-1130-remove.patch, 
 math-1130-replace.patch, math-1130.patch


 These are utility functions mostly required as part of MathArrays.
 MathArrays:
 =
 The requirement is as follows:
 a) double[] copyOf(double[] values, int begin, int length) ;
 Similar to most other functions that support slice defined  by  the array 
 part from [begin, begin+length) ;its a requirement to copy a slice which is 
 not available (the closest is copyOf(array, int len) which misses out the 
 begin index)
 b) double[] removeAll(double[] values, int begin, int length, double 
 removable);
 Need a function to remove a value from array slice defined by 
 [begin,begin+length) and return the filtered version.
 c) double[] replaceAll(double[] values, int begin, int length, double 
 oldValue, double newValue);
 Need a function to replace inplace an oldValue substituted with newValue in 
 the array slice defined by [begin,begin+length) and return the original  
 complete array with just replaced values only in the segment 
 [begin,begin+length)
 MathUtils
 =
 boolean canEqual(double d1, double d2) ;
 provide a canEqual function that is slightly better than exisitng 
 MathUtils.equals. We could also improve existing equals method however.
 So the change here is that the new enhanced canEqual can do a quick check on 
 Nans  and then move to a detailed Double.compare(..) method. This avoids the 
 Double.compare call when any one of them is NaN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (LOGGING-37) [logging] LogFactory#getLogFactory should not look for method every time

2014-06-25 Thread Archie Cobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/LOGGING-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043663#comment-14043663
 ] 

Archie Cobbs commented on LOGGING-37:
-

I noticed this method consuming inordinate CPU on my system as well.

Can the next release of commons-logging drop support for Java 1.1? If so then 
this issue can go away... simply invoke {{Thread.getContextClassLoader()}} 
directly.


 [logging] LogFactory#getLogFactory should not look for method every time
 

 Key: LOGGING-37
 URL: https://issues.apache.org/jira/browse/LOGGING-37
 Project: Commons Logging
  Issue Type: Bug
Affects Versions: 1.0.4
 Environment: Operating System: other
 Platform: Other
Reporter: Matthias Ernst

 LogFactory checks for the existence of Thread#getContextClassLoader every time
 #getLogFactory is invoked and does a reflective invocation. This is
 unnecessarily expensive if many Log objects are created. An easy patch is to
 remember the Method object; the lookup happens only once and it will massively
 profit from reflection optimizations after a number of calls (a Java code stub
 is generated by the reflection package).
 Patch:
 419a420,426
  private static Method GET_CONTEXT_CLASS_LOADER = null;
  static {
try {
  GET_CONTEXT_CLASS_LOADER = 
  Thread.class.getMethod(getContextClassLoad
 er, null);
} catch (NoSuchMethodException e) {
}
  }
 436,439c443
  try {
  // Are we running on a JDK 1.2 or later system?
  Method method = Thread.class.getMethod(getContextClassLoader, 
 nu
 ll);
 
 ---
  if(GET_CONTEXT_CLASS_LOADER != null) {
 442c446
  classLoader = 
 (ClassLoader)method.invoke(Thread.currentThread(
 ), null);
 ---
  classLoader = 
  (ClassLoader)GET_CONTEXT_CLASS_LOADER.invoke(Thr
 ead.currentThread(), null);
 472c476
  } catch (NoSuchMethodException e) {
 ---
  } else {



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread Phil Steitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043712#comment-14043712
 ] 

Phil Steitz commented on MATH-1131:
---

Thanks for reporting this and providing the code and data.  I suspect the 
problem is in the matrix exponentiation done in the roundedK method.  Anyone 
interested in patching this should start by looking at the reference in the 
class javadoc (and other sources) to identify optimizations that can be done 
for large samples.

 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043857#comment-14043857
 ] 

Thomas Neidhart commented on MATH-1131:
---

I did briefly debug the example and indeed the calculation hangs when calling 
roundedK, or more precisely in createH.

There powers of BigFraction objects are created with really big numerators and 
denominators. Some of the calculations later on take then forever because of 
this, e.g. when internally calculating the gcd.

Looking at the implementation from the referenced paper, there the H values are 
computed with double precision. Was there a specific reason to use BigFraction 
in our implementation? Is there a specific need for that level of accuracy for 
the Kolmogorov-Smirnov Test? The other inference tests do not seem to be so 
stringent.

It looks like there is no easy way to limit the maxDenominator when calling 
multiply() as it is possible when creating a BigFraction object.


 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (LOGGING-37) [logging] LogFactory#getLogFactory should not look for method every time

2014-06-25 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/LOGGING-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043881#comment-14043881
 ] 

Thomas Neidhart commented on LOGGING-37:


I think it is reasonable to make a 1.2 release that drops java 1.1 support.

 [logging] LogFactory#getLogFactory should not look for method every time
 

 Key: LOGGING-37
 URL: https://issues.apache.org/jira/browse/LOGGING-37
 Project: Commons Logging
  Issue Type: Bug
Affects Versions: 1.0.4
 Environment: Operating System: other
 Platform: Other
Reporter: Matthias Ernst

 LogFactory checks for the existence of Thread#getContextClassLoader every time
 #getLogFactory is invoked and does a reflective invocation. This is
 unnecessarily expensive if many Log objects are created. An easy patch is to
 remember the Method object; the lookup happens only once and it will massively
 profit from reflection optimizations after a number of calls (a Java code stub
 is generated by the reflection package).
 Patch:
 419a420,426
  private static Method GET_CONTEXT_CLASS_LOADER = null;
  static {
try {
  GET_CONTEXT_CLASS_LOADER = 
  Thread.class.getMethod(getContextClassLoad
 er, null);
} catch (NoSuchMethodException e) {
}
  }
 436,439c443
  try {
  // Are we running on a JDK 1.2 or later system?
  Method method = Thread.class.getMethod(getContextClassLoader, 
 nu
 ll);
 
 ---
  if(GET_CONTEXT_CLASS_LOADER != null) {
 442c446
  classLoader = 
 (ClassLoader)method.invoke(Thread.currentThread(
 ), null);
 ---
  classLoader = 
  (ClassLoader)GET_CONTEXT_CLASS_LOADER.invoke(Thr
 ead.currentThread(), null);
 472c476
  } catch (NoSuchMethodException e) {
 ---
  } else {



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (LOGGING-37) [logging] LogFactory#getLogFactory should not look for method every time

2014-06-25 Thread Thomas Neidhart (JIRA)

 [ 
https://issues.apache.org/jira/browse/LOGGING-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Neidhart reopened LOGGING-37:



 [logging] LogFactory#getLogFactory should not look for method every time
 

 Key: LOGGING-37
 URL: https://issues.apache.org/jira/browse/LOGGING-37
 Project: Commons Logging
  Issue Type: Bug
Affects Versions: 1.0.4
 Environment: Operating System: other
 Platform: Other
Reporter: Matthias Ernst

 LogFactory checks for the existence of Thread#getContextClassLoader every time
 #getLogFactory is invoked and does a reflective invocation. This is
 unnecessarily expensive if many Log objects are created. An easy patch is to
 remember the Method object; the lookup happens only once and it will massively
 profit from reflection optimizations after a number of calls (a Java code stub
 is generated by the reflection package).
 Patch:
 419a420,426
  private static Method GET_CONTEXT_CLASS_LOADER = null;
  static {
try {
  GET_CONTEXT_CLASS_LOADER = 
  Thread.class.getMethod(getContextClassLoad
 er, null);
} catch (NoSuchMethodException e) {
}
  }
 436,439c443
  try {
  // Are we running on a JDK 1.2 or later system?
  Method method = Thread.class.getMethod(getContextClassLoader, 
 nu
 ll);
 
 ---
  if(GET_CONTEXT_CLASS_LOADER != null) {
 442c446
  classLoader = 
 (ClassLoader)method.invoke(Thread.currentThread(
 ), null);
 ---
  classLoader = 
  (ClassLoader)GET_CONTEXT_CLASS_LOADER.invoke(Thr
 ead.currentThread(), null);
 472c476
  } catch (NoSuchMethodException e) {
 ---
  } else {



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043902#comment-14043902
 ] 

Schalk W. Cronjé commented on MATH-1131:


This section of code in createH might be part of the problem. A quick test on 
my macbook shows that the most of 36 minutes are spent inside there for 
d=0.029357223978016822, n=. (I specifically tried 9,999 as it was one less 
than 10,000).

{code:java}
for (int i = 0; i  m; ++i) {
  for (int j = 0; j  i + 1; ++j) {
if (i - j + 1  0) {
for (int g = 2; g = i - j + 1; ++g) {
  Hdata[i][j] = Hdata[i][j].divide(g);
}
}
  }
}

{code}


 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1130) A new set of functions for copyof, remove and replace a given value on a slice of array

2014-06-25 Thread Venkatesha Murthy TS (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043913#comment-14043913
 ] 

Venkatesha Murthy TS commented on MATH-1130:


Regarding equalsIncludingNaN: 
What is the test code used or the micro benchmark? can you please share ? What 
does N correspond to?

In my measurements in my machine with respect to the test 
testMath1130ForDoubleEqual in PrecisionTest.java attached
i j d  time in seconds(old 
code)  time in seconds(with venkats code)
10 10  1000.027 
   00.003
10 10  100  00.020  
  00.011
10 10  100000.038   
 00.023
10 100100000.066
00.073
10 1000  100000.287 
   00.127
10 1100002.280  
  00.947 
10 11  22.217   
 09.275
10 110  224.454 
   91.918
10 10  1224.673 
   91.319

Well trying to understand a bit more. So as i am seeing most times the above 
timings...
Wondering where am i seeing the gap as at every level (i,j,d) there is a clear 
difference from (10s of milliseconds to 100s of seconds)(bigger as the 
iterations go up)

Basically If only one of them is NaN it does not make sense to get to a 
detailed compare which is what i have eliminated.
Iam just following this tradition already existing in say 
MathArrays.equals(final float[] x, final float[] y)  method where null checks 
are eliminated earlier in this fashion.
Also if x!=x seems cryptic, i could replace with Double.isNaN() to make it bit 
obvious.(is this is the concern?)

Please help me as to what is less clear


 A new set of functions for copyof, remove and replace a given value on a 
 slice of array
 ---

 Key: MATH-1130
 URL: https://issues.apache.org/jira/browse/MATH-1130
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.4
Reporter: Venkatesha Murthy TS
 Attachments: equalsIncludingNaN.dat, math-1130-checknotnan.patch, 
 math-1130-precision-equals.patch, math-1130-remove.patch, 
 math-1130-replace.patch, math-1130.patch


 These are utility functions mostly required as part of MathArrays.
 MathArrays:
 =
 The requirement is as follows:
 a) double[] copyOf(double[] values, int begin, int length) ;
 Similar to most other functions that support slice defined  by  the array 
 part from [begin, begin+length) ;its a requirement to copy a slice which is 
 not available (the closest is copyOf(array, int len) which misses out the 
 begin index)
 b) double[] removeAll(double[] values, int begin, int length, double 
 removable);
 Need a function to remove a value from array slice defined by 
 [begin,begin+length) and return the filtered version.
 c) double[] replaceAll(double[] values, int begin, int length, double 
 oldValue, double newValue);
 Need a function to replace inplace an oldValue substituted with newValue in 
 the array slice defined by [begin,begin+length) and return the original  
 complete array with just replaced values only in the segment 
 [begin,begin+length)
 MathUtils
 =
 boolean canEqual(double d1, double d2) ;
 provide a canEqual function that is slightly better than exisitng 
 MathUtils.equals. We could also improve existing equals method however.
 So the change here is that the new enhanced canEqual can do a quick check on 
 Nans  and then move to a detailed Double.compare(..) method. This avoids the 
 Double.compare call when any one of them is NaN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OGNL-185) Performance issue on high load (thread blocking)

2014-06-25 Thread Brandon Ramirez (JIRA)

[ 
https://issues.apache.org/jira/browse/OGNL-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043996#comment-14043996
 ] 

Brandon Ramirez commented on OGNL-185:
--

Can anybody tell me if this is fixed or not?  It was re-opened in October 2012, 
followed by a comment saying it can be closed.  Is this the same as WW-3580?

We had this issue in production today and need to know if upgrading will 
address this issue or not.

 Performance issue on high load (thread blocking)
 

 Key: OGNL-185
 URL: https://issues.apache.org/jira/browse/OGNL-185
 Project: Commons OGNL
  Issue Type: Bug
Reporter: Christian Grobmeier
Assignee: Maurizio Cucchiara
Priority: Critical
 Attachments: thread-dump-lock.txt, thread-dump-lock2.txt


 The Struts project is suffering from an issue occuring from OGNL on heavy 
 load.
 The issue in question (with details) is: 
 https://issues.apache.org/jira/browse/WW-3580
 A similar issues has been reported in the OpenSymphony bugtracker:
 https://issues.apache.org/jira/browse/WW-3580



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044019#comment-14044019
 ] 

Schalk W. Cronjé commented on MATH-1131:


[~p...@steitz.com]said on the ML:

bq. Sorry for responding to the list but I have only mobile atm .  IIRC the 
roundedK method should not be creating matrices of BigFractions, but rather 
using doubles.

I did a quick hack on the test code I used for createH earlier to use double 
instead and the speed improvement as expected is immense - down from 36min to 
9min. I cannot comment on whether the change in precision is significant, but 
not was not the point of the test.



 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044019#comment-14044019
 ] 

Schalk W. Cronjé edited comment on MATH-1131 at 6/25/14 8:44 PM:
-

[~p...@steitz.com] said on the ML:

bq. Sorry for responding to the list but I have only mobile atm .  IIRC the 
roundedK method should not be creating matrices of BigFractions, but rather 
using doubles.

I did a quick hack on the test code I used for createH earlier to use double 
instead and the speed improvement as expected is immense - down from 36min to 
9min. I cannot comment on whether the change in precision is significant, but 
not was not the point of the test.




was (Author: ysb33r):
[~p...@steitz.com]said on the ML:

bq. Sorry for responding to the list but I have only mobile atm .  IIRC the 
roundedK method should not be creating matrices of BigFractions, but rather 
using doubles.

I did a quick hack on the test code I used for createH earlier to use double 
instead and the speed improvement as expected is immense - down from 36min to 
9min. I cannot comment on whether the change in precision is significant, but 
not was not the point of the test.



 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044029#comment-14044029
 ] 

Thomas Neidhart commented on MATH-1131:
---

Yes, I did the same test, and the unit test still pass successfully.

The reason it still takes quite long is related to the input data: in your 
example you have 1 samples.
To evaluate the result we need to calculate the pow of the calculated H matrix 
(~ 500x500) like this:

{noformat}
final RealMatrix Hpower = H.power(n);
{noformat}

Now, n is 1, which makes this a *very* expensive operation. I do not know 
if there is a reasonable approximation for this.

 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset

2014-06-25 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044036#comment-14044036
 ] 

Thomas Neidhart commented on MATH-1131:
---

Ah ok, the paper at http://www.jstatsoft.org/v08/i18/ talks about a quick 
approximation in case n is very large, so we need to take a closer look at the 
attached source code how this part is handled (the mPower function).

 Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
 ---

 Key: MATH-1131
 URL: https://issues.apache.org/jira/browse/MATH-1131
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.3
 Environment: Java 8
Reporter: Schalk W. Cronjé
 Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java


 I have code simplified to the following:
 KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
 NormalDistribution nd = new NormalDistribution(mean,stddev);
 kst.kolmogorovSmirnovTest(nd,dataset)
 I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest 
 takes 'forever'. It has not returned after nearly 15minutes and in one my my 
 tests has gone over 150MB in  memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MATH-1130) A new set of functions for copyof, remove and replace a given value on a slice of array

2014-06-25 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044169#comment-14044169
 ] 

Gilles commented on MATH-1130:
--

bq. What is the test code used or the micro benchmark?

It's in {{src/test/java/org/apache/commons/math3/PerfTestUtils.java}}.

The code I used is
{code}
package org.apache.commons.math3.util;

import org.apache.commons.math3.PerfTestUtils;
import org.apache.commons.math3.distribution.RealDistribution;
import org.apache.commons.math3.distribution.UniformRealDistribution;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.Assert;

/**
 * Performance tests for Precision.equals.
 * Not enabled by default, as the class does not end in Test.
 * 
 * Invoke by runningbr/
 * {@code mvn test -Dtest=EqualsTestPerf}br/
 * or by runningbr/
 * {@code mvn test -Dtest=EqualsTestPerf -DargLine=-DtestRuns=1234 
-server}br/
 */
public class EqualsTestPerf {
private static final int RUNS = 
Integer.parseInt(System.getProperty(testRuns,1000));

@Test
public void testSimpleBenchmark() {
final String D = CM;
final String DM = V;

final int numStat = 100;
final int numCall = RUNS / numStat;

final RealDistribution d = new UniformRealDistribution(0, 
Double.MAX_VALUE);
final double v = d.sample();
final double w = d.sample();

PerfTestUtils.timeAndReport(equalsIncludingNaN,
numCall,
numStat,
false,
new PerfTestUtils.RunTest(D) {
@Override
public Double call() throws Exception {
return 
Precision.equalsIncludingNaN(v, w) ? 1d : 0d;
}
},
new PerfTestUtils.RunTest(DM) {
@Override
public Double call() throws Exception {
return 
EqualsTestPerf.equalsIncludingNaN(v, w) ? 1d : 0d;
}
});
}

public static boolean equalsIncludingNaN(double x, double y) {
return (x != x || y != y) ?
!(x != x ^ y != y) :
Precision.equals(x, y, 1);
}
}
{code}

bq. What does N correspond to?

The value of the testRuns property.

bq. Basically If only one of them is NaN it does not make sense to get to a 
detailed compare which is what i have eliminated.

OK. I got it. In that case (only one argument is NaN), your code should be 
faster indeed. In the general case (none of the arguments is NaN), it's not 
that obvious, and the benchmarking code does not seem to help figuring it out. 
You are most welcome to try it (and report flaws...).


 A new set of functions for copyof, remove and replace a given value on a 
 slice of array
 ---

 Key: MATH-1130
 URL: https://issues.apache.org/jira/browse/MATH-1130
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.4
Reporter: Venkatesha Murthy TS
 Attachments: equalsIncludingNaN.dat, math-1130-checknotnan.patch, 
 math-1130-precision-equals.patch, math-1130-remove.patch, 
 math-1130-replace.patch, math-1130.patch


 These are utility functions mostly required as part of MathArrays.
 MathArrays:
 =
 The requirement is as follows:
 a) double[] copyOf(double[] values, int begin, int length) ;
 Similar to most other functions that support slice defined  by  the array 
 part from [begin, begin+length) ;its a requirement to copy a slice which is 
 not available (the closest is copyOf(array, int len) which misses out the 
 begin index)
 b) double[] removeAll(double[] values, int begin, int length, double 
 removable);
 Need a function to remove a value from array slice defined by 
 [begin,begin+length) and return the filtered version.
 c) double[] replaceAll(double[] values, int begin, int length, double 
 oldValue, double newValue);
 Need a function to replace inplace an oldValue substituted with newValue in 
 the array slice defined by [begin,begin+length) and return the original  
 complete array with just replaced values only in the segment 
 [begin,begin+length)
 MathUtils
 =
 boolean canEqual(double d1, double d2) ;
 provide a canEqual function that is slightly better than exisitng 
 MathUtils.equals. We could also improve existing equals method however.
 So the change here is that the new enhanced canEqual can do a quick check on 
 Nans  and then move to a detailed Double.compare(..)