[jira] [Commented] (LANG-1083) Add (T) casts to get unit tests to pass in old JDK.

2015-01-20 Thread Jonathan Baker (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283783#comment-14283783
 ] 

Jonathan Baker commented on LANG-1083:
--

Created PR https://github.com/apache/commons-lang/pull/42

 Add (T) casts to get unit tests to pass in old JDK.
 ---

 Key: LANG-1083
 URL: https://issues.apache.org/jira/browse/LANG-1083
 Project: Commons Lang
  Issue Type: Bug
 Environment: Maven 3.2.5, Java 1.6.0_18, Fedora 11, AMD 64 
 (2.6.30.10-105.2.23.fc11.x86_64)
Reporter: Jonathan Baker
Priority: Trivial

 This is probably just a quirk of the old JDK that was used.
 The casts are not necessary on other computers, but they don't seem to hurt 
 either.  (Please verify that of course!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (LANG-1083) Add (T) casts to get unit tests to pass in old JDK.

2015-01-20 Thread Jonathan Baker (JIRA)
Jonathan Baker created LANG-1083:


 Summary: Add (T) casts to get unit tests to pass in old JDK.
 Key: LANG-1083
 URL: https://issues.apache.org/jira/browse/LANG-1083
 Project: Commons Lang
  Issue Type: Bug
 Environment: Maven 3.2.5, Java 1.6.0_18, Fedora 11, AMD 64 
(2.6.30.10-105.2.23.fc11.x86_64)
Reporter: Jonathan Baker
Priority: Trivial


This is probably just a quirk of the old JDK that was used.
The casts are not necessary on other computers, but they don't seem to hurt 
either.  (Please verify that of course!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283756#comment-14283756
 ] 

Thomas Neidhart commented on MATH-1197:
---

One observation: the samples contain a lot of equal values.

The KS test statistic is implemented using Arrays.binarySearch, but this does 
not specify which index will be found when looking for a given value in a 
sorted array.
E.g. if you have samples [0, 0, 0, 0, 0, 1] and you search for 0, you might get 
an index in the range [0, 4]. As far as I understand the KS statistic, it is an 
empirical distribution function which calculates the cumulative density based 
on how many values are less or equal than the given observation, which is not 
equal to the result returned by Arrays.binarySearch.

 Incorrect Kolmogorov–Smirnov Statistic for two samples 
 ---

 Key: MATH-1197
 URL: https://issues.apache.org/jira/browse/MATH-1197
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.4.1
 Environment: Ubuntu 14.04
Reporter: Danaja Thiyunuwan Maldeniya

 kolmogorovSmirnovTest(double[],double[]) against the samples given below 
 gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
 kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
 instead of 0.064 (verified with ks.test in R and JDistlib)
   double[] x = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
 
 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
 
 ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
 ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
 ,30.584960, 30.584960, 30.751808};
  double[] y = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
  
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COMPRESS-298) Cleaner way to catch/detect Seven7 files which are password protected

2015-01-20 Thread Stefan Bodewig (JIRA)

 [ 
https://issues.apache.org/jira/browse/COMPRESS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig resolved COMPRESS-298.
-
   Resolution: Fixed
Fix Version/s: 1.10

The way SevenZFile is coded it is difficult to provide a canReadEntryData 
method like we've got for ZipFile, so right now throwing a special exception 
seems to be the best solution.

svn revision 1653252

 Cleaner way to catch/detect Seven7 files which are password protected
 -

 Key: COMPRESS-298
 URL: https://issues.apache.org/jira/browse/COMPRESS-298
 Project: Commons Compress
  Issue Type: Improvement
  Components: Archivers
Affects Versions: 1.8.1
Reporter: Nick Burch
 Fix For: 1.10


 Currently, if we open a password protected 7z file and call 
 {{getNextEntry()}} on it, it will blow up with an IOException with a specific 
 string:
 {code}
 java.io.IOException: Cannot read encrypted files without a password
   at 
 org.apache.commons.compress.archivers.sevenz.AES256SHA256Decoder$1.init(AES256SHA256Decoder.java:56)
   at 
 org.apache.commons.compress.archivers.sevenz.AES256SHA256Decoder$1.read(AES256SHA256Decoder.java:112)
   at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288)
   at org.tukaani.xz.rangecoder.RangeDecoderFromStream.init(Unknown 
 Source)
   at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source)
   at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source)
   at org.tukaani.xz.LZMAInputStream.init(Unknown Source)
   at 
 org.apache.commons.compress.archivers.sevenz.Coders$LZMADecoder.decode(Coders.java:113)
   at 
 org.apache.commons.compress.archivers.sevenz.Coders.addDecoder(Coders.java:77)
   at 
 org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:853)
   at 
 org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:820)
   at 
 org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:151)
 {code}
 It would be good if either a specific subtype of IOException could be thrown 
 (which could then be caught to differentiate this from other kinds of IO 
 problems), or if a method could be added to SevenZFile which could be called 
 to see if a password is needed / given password is correct
 (If implemented, this would help make the code in Tika dealing with 7z files 
 cleaner)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (COMPRESS-288) Missing support for 7z ppmd compression format.

2015-01-20 Thread Stefan Bodewig (JIRA)

 [ 
https://issues.apache.org/jira/browse/COMPRESS-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig updated COMPRESS-288:

Fix Version/s: (was: 1.8.1)

 Missing support for 7z ppmd compression format.
 ---

 Key: COMPRESS-288
 URL: https://issues.apache.org/jira/browse/COMPRESS-288
 Project: Commons Compress
  Issue Type: New Feature
  Components: Archivers
Affects Versions: 1.8.1
 Environment: Tika from truck build.
Reporter: sunxingzhe
  Labels: 7z

 When Commons Compress 1.8.1 parse 7z type with ppmd compression format, the 
 following error occurred.
 Caused by: java.io.IOException: Unsupported compression method [3, 4, 1]
   at 
 org.apache.commons.compress.archivers.sevenz.Coders.addDecoder(Coders.java:74)
   at 
 org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:865)
   at 
 org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:832)
   at 
 org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:151)
   at 
 org.apache.tika.parser.pkg.PackageParser$SevenZWrapper.getNextEntry(PackageParser.java:224)
   at 
 org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   ... 5 more
 Please tell me tika whether or not to support ppmd decoder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMPRESS-294) .Z decompress “Invalid 9 bit code 0x183”

2015-01-20 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283854#comment-14283854
 ] 

Stefan Bodewig commented on COMPRESS-294:
-

any news?

 .Z decompress “Invalid 9 bit code 0x183”
 

 Key: COMPRESS-294
 URL: https://issues.apache.org/jira/browse/COMPRESS-294
 Project: Commons Compress
  Issue Type: Bug
  Components: Build
Affects Versions: 1.9
Reporter: Q
 Attachments: commons-compress-1.10-SNAPSHOT.jar


 Trying to decompress a .Z file I get “Invalid 9 bit code 0x183”
 It seems that the z file was created under unix using the default bits value 
 (16 bits). The current implementation seems to support only 9 bits.
 (I can't provide a sample file since contains client data but I will try to 
 get on dummy file from the client)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (VFS-558) java.lang.UnsupportedOperationException in FtpFileObject

2015-01-20 Thread L (JIRA)

[ 
https://issues.apache.org/jira/browse/VFS-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283917#comment-14283917
 ] 

L commented on VFS-558:
---

Re: Thanks for testing! I really hope to release soon (so you better not find 
new bugs )

Sorry: VFS-559

 java.lang.UnsupportedOperationException in FtpFileObject
 

 Key: VFS-558
 URL: https://issues.apache.org/jira/browse/VFS-558
 Project: Commons VFS
  Issue Type: Bug
Affects Versions: 2.0
Reporter: L
Assignee: Bernd Eckenfels
 Fix For: 2.1


 I am getting the following exception in my code:
 java.lang.UnsupportedOperationException
   at java.util.Collections$UnmodifiableMap.remove(Collections.java:1345)
   at 
 org.apache.commons.vfs2.provider.ftp.FtpFileObject.onChildrenChanged(FtpFileObject.java:271)
   at 
 org.apache.commons.vfs2.provider.AbstractFileObject.childrenChanged(AbstractFileObject.java:240)
   at 
 org.apache.commons.vfs2.provider.AbstractFileObject.notifyParent(AbstractFileObject.java:1931)
   at 
 org.apache.commons.vfs2.provider.AbstractFileObject.handleCreate(AbstractFileObject.java:1577)
   at 
 org.apache.commons.vfs2.provider.AbstractFileObject.moveTo(AbstractFileObject.java:1866)
   at 
 org.apache.commons.vfs2.impl.DecoratedFileObject.moveTo(DecoratedFileObject.java:241)
   at 
 org.apache.commons.vfs2.cache.OnCallRefreshFileObject.moveTo(OnCallRefreshFileObject.java:184)
 ...
 I guess it is caused by the fact that children field is set to 
 EMPTY_FTP_FILE_MAP at the moment onChildrenChanged() is invoked.
 I also do not like line 1866 in AbstractFileObject.java. To me it looks like 
 it might be the real cause of the problem:
 FileObjectUtils.getAbstractFileObject(destFile).handleCreate(getType());
 Must it not be destFile.getType()?
 But even if I am right about AbstractFileObject.java:1866, 
 FtpFileObject.onChildrenChanged() must be corrected as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (LANG-1083) Add (T) casts to get unit tests to pass in old JDK

2015-01-20 Thread Benedikt Ritter (JIRA)

 [ 
https://issues.apache.org/jira/browse/LANG-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedikt Ritter updated LANG-1083:
--
Summary: Add (T) casts to get unit tests to pass in old JDK  (was: Add (T) 
casts to get unit tests to pass in old JDK.)

 Add (T) casts to get unit tests to pass in old JDK
 --

 Key: LANG-1083
 URL: https://issues.apache.org/jira/browse/LANG-1083
 Project: Commons Lang
  Issue Type: Bug
 Environment: Maven 3.2.5, Java 1.6.0_18, Fedora 11, AMD 64 
 (2.6.30.10-105.2.23.fc11.x86_64)
Reporter: Jonathan Baker
Priority: Trivial

 This is probably just a quirk of the old JDK that was used.
 The casts are not necessary on other computers, but they don't seem to hurt 
 either.  (Please verify that of course!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Phil Steitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284003#comment-14284003
 ] 

Phil Steitz commented on MATH-1197:
---

Yes, this is a bug.  Arrays.binarySearch should not have been used here.

 Incorrect Kolmogorov–Smirnov Statistic for two samples 
 ---

 Key: MATH-1197
 URL: https://issues.apache.org/jira/browse/MATH-1197
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.4.1
 Environment: Ubuntu 14.04
Reporter: Danaja Thiyunuwan Maldeniya

 kolmogorovSmirnovTest(double[],double[]) against the samples given below 
 gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
 kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
 instead of 0.064 (verified with ks.test in R and JDistlib)
   double[] x = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
 
 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
 
 ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
 ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
 ,30.584960, 30.584960, 30.751808};
  double[] y = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
  
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (LANG-1083) Add (T) casts to get unit tests to pass in old JDK

2015-01-20 Thread Benedikt Ritter (JIRA)

 [ 
https://issues.apache.org/jira/browse/LANG-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedikt Ritter resolved LANG-1083.
---
Resolution: Fixed

{code}
$ svn ci -m LANG-1083: Add (T) casts to get unit tests to pass in old JDK. 
This fixes #42 from github. Thanks to Jonathan Baker.
Sendingsrc/changes/changes.xml
Sendingsrc/main/java/org/apache/commons/lang3/SerializationUtils.java
Sending
src/test/java/org/apache/commons/lang3/exception/AbstractExceptionContextTest.java
Transmitting file data ...
Committed revision 1653307.
{code}

Thanks!

 Add (T) casts to get unit tests to pass in old JDK
 --

 Key: LANG-1083
 URL: https://issues.apache.org/jira/browse/LANG-1083
 Project: Commons Lang
  Issue Type: Bug
 Environment: Maven 3.2.5, Java 1.6.0_18, Fedora 11, AMD 64 
 (2.6.30.10-105.2.23.fc11.x86_64)
Reporter: Jonathan Baker
Assignee: Benedikt Ritter
Priority: Trivial

 This is probably just a quirk of the old JDK that was used.
 The casts are not necessary on other computers, but they don't seem to hurt 
 either.  (Please verify that of course!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (VFS-559) FTPClientWrapper is not robust against some failures

2015-01-20 Thread L (JIRA)
L created VFS-559:
-

 Summary: FTPClientWrapper is not robust against some failures
 Key: VFS-559
 URL: https://issues.apache.org/jira/browse/VFS-559
 Project: Commons VFS
  Issue Type: Bug
Affects Versions: 2.0
Reporter: L


The goal of the class is stated in javadoc:
A wrapper to the FTPClient to allow automatic reconnect on connection loss.

A lot of its methods look like :

try
{
do something...
}
catch (final IOException e)
{
disconnect();
try to repeat the operation...
}

Unfortunately disonnect() can fail for the same reason as the original do 
something. In my case it as a connection reset. So instead of the original 
exception I was getting more or less the same exception from 
getFtpClient().quit();

So the wrapper did not help at all.

I guess all the disconnect() invocations must also be inside try/catch so that 
even if disconnect() throws, the method goes on to the next step:  try to 
repeat the operation...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284500#comment-14284500
 ] 

Thomas Neidhart edited comment on MATH-1197 at 1/20/15 10:10 PM:
-

The exactP method also seems to have a problem when comparing it with the 
results from R.
Take this example:

{code}
double[] x = new double[] { 0, 0, 0, 0, 1 };
double[] y = new double[] { 0, 0, 1, 1, 2, 3 };

final KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();
System.out.println(p= + test.kolmogorovSmirnovTest(x, y, true));
System.out.println(D= + test.kolmogorovSmirnovStatistic(x, y));

System.out.println(approximateP= + 
test.approximateP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length));
System.out.println(exactP= + 
test.exactP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length, false));
{code}

returns:

{noformat}
p=0.35714285714285715
D=0.46673
approximateP=0.5925028311389975
exactP=0.4155844155844156
{noformat}

R computes the following:

{noformat}
data:  x and y
D = 0.4667, p-value = 0.5925
alternative hypothesis: two-sided
{noformat}

Edit: the reason seems to be that R can not compute exactP values in case of 
ties.


was (Author: tn):
The exactP method also seems to have a problem when comparing it with the 
results from R.
Take this example:

{code}
double[] x = new double[] { 0, 0, 0, 0, 1 };
double[] y = new double[] { 0, 0, 1, 1, 2, 3 };

final KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();
System.out.println(p= + test.kolmogorovSmirnovTest(x, y, true));
System.out.println(D= + test.kolmogorovSmirnovStatistic(x, y));

System.out.println(approximateP= + 
test.approximateP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length));
System.out.println(exactP= + 
test.exactP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length, false));
{code}

returns:

{noformat}
p=0.35714285714285715
D=0.46673
approximateP=0.5925028311389975
exactP=0.4155844155844156
{noformat}

R computes the following:

{noformat}
data:  x and y
D = 0.4667, p-value = 0.5925
alternative hypothesis: two-sided
{noformat}

 Incorrect Kolmogorov–Smirnov Statistic for two samples 
 ---

 Key: MATH-1197
 URL: https://issues.apache.org/jira/browse/MATH-1197
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.4.1
 Environment: Ubuntu 14.04
Reporter: Danaja Thiyunuwan Maldeniya
 Attachments: MATH-1197.patch


 kolmogorovSmirnovTest(double[],double[]) against the samples given below 
 gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
 kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
 instead of 0.064 (verified with ks.test in R and JDistlib)
   double[] x = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
 
 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
 

[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284500#comment-14284500
 ] 

Thomas Neidhart commented on MATH-1197:
---

The exactP method also seems to have a problem when comparing it with the 
results from R.
Take this example:

{code}
double[] x = new double[] { 0, 0, 0, 0, 1 };
double[] y = new double[] { 0, 0, 1, 1, 2, 3 };

final KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();
System.out.println(p= + test.kolmogorovSmirnovTest(x, y, true));
System.out.println(D= + test.kolmogorovSmirnovStatistic(x, y));

System.out.println(approximateP= + 
test.approximateP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length));
System.out.println(exactP= + 
test.exactP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length, false));
{code}

returns:

{noformat}
p=0.35714285714285715
D=0.46673
approximateP=0.5925028311389975
exactP=0.4155844155844156
{noformat}

R computes the following:

{noformat}
data:  x and y
D = 0.4667, p-value = 0.5925
alternative hypothesis: two-sided
{noformat}

 Incorrect Kolmogorov–Smirnov Statistic for two samples 
 ---

 Key: MATH-1197
 URL: https://issues.apache.org/jira/browse/MATH-1197
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.4.1
 Environment: Ubuntu 14.04
Reporter: Danaja Thiyunuwan Maldeniya
 Attachments: MATH-1197.patch


 kolmogorovSmirnovTest(double[],double[]) against the samples given below 
 gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
 kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
 instead of 0.064 (verified with ks.test in R and JDistlib)
   double[] x = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
 
 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
 
 ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
 ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
 ,30.584960, 30.584960, 30.751808};
  double[] y = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
  
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Phil Steitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284589#comment-14284589
 ] 

Phil Steitz commented on MATH-1197:
---

+1 on the patch

 Incorrect Kolmogorov–Smirnov Statistic for two samples 
 ---

 Key: MATH-1197
 URL: https://issues.apache.org/jira/browse/MATH-1197
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.4.1
 Environment: Ubuntu 14.04
Reporter: Danaja Thiyunuwan Maldeniya
 Attachments: MATH-1197.patch


 kolmogorovSmirnovTest(double[],double[]) against the samples given below 
 gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
 kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
 instead of 0.064 (verified with ks.test in R and JDistlib)
   double[] x = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
 
 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
 
 ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
 ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
 ,30.584960, 30.584960, 30.751808};
  double[] y = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
  
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Phil Steitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284576#comment-14284576
 ] 

Phil Steitz commented on MATH-1197:
---

Assuming whatever bugs in the D computation have been fixed, our exactP should 
actually be exact.  I could not make sense of, or find documentation for, 
what R does for small samples.  Our code computes the exact distribution of the 
associated D statistic.  I suspect that R does some kind of approximation.  As 
you said, R I think also disallows ties.

 Incorrect Kolmogorov–Smirnov Statistic for two samples 
 ---

 Key: MATH-1197
 URL: https://issues.apache.org/jira/browse/MATH-1197
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.4.1
 Environment: Ubuntu 14.04
Reporter: Danaja Thiyunuwan Maldeniya
 Attachments: MATH-1197.patch


 kolmogorovSmirnovTest(double[],double[]) against the samples given below 
 gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
 kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
 instead of 0.064 (verified with ks.test in R and JDistlib)
   double[] x = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
 
 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
 
 ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
 ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
 ,30.584960, 30.584960, 30.751808};
  double[] y = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
  
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Thomas Neidhart (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Neidhart updated MATH-1197:
--
Attachment: MATH-1197.patch

The attached patch fixes the two-sample KS test statistic calculation.
Actually, I did not fully understand the way it was calculated before, but I 
followed now the formula from the wiki page and it results in the correct 
result.

After fixing this issue, another problem popped up in test 
testTwoSampleProductSizeOverflow which relates to a TODO in the ksSum method.

 Incorrect Kolmogorov–Smirnov Statistic for two samples 
 ---

 Key: MATH-1197
 URL: https://issues.apache.org/jira/browse/MATH-1197
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.4.1
 Environment: Ubuntu 14.04
Reporter: Danaja Thiyunuwan Maldeniya
 Attachments: MATH-1197.patch


 kolmogorovSmirnovTest(double[],double[]) against the samples given below 
 gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
 kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
 instead of 0.064 (verified with ks.test in R and JDistlib)
   double[] x = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
 
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
 
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
 
 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
 
 ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
 ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
 ,30.584960, 30.584960, 30.751808};
  double[] y = 
 {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
  
 ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
  
 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)