[GitHub] [commons-math] chentao106 closed pull request #117: Implement the MiniBatchKMeansClusterer

2020-01-17 Thread GitBox
chentao106 closed pull request #117: Implement the MiniBatchKMeansClusterer
URL: https://github.com/apache/commons-math/pull/117
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (CODEC-264) murmur3.hash128() does not account for unsigned seed argument

2020-01-17 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018468#comment-17018468
 ] 

Alex Herbert commented on CODEC-264:


Thanks for the raising this.

The effect is that despite creating a new method for the fixed version to 
maintain behavioural compatibility with the old broken version the code 
actually fixes the old version and breaks behavioural compatibility.

I have added a test to maintain behavioural compatibility and fixed the code as 
suggested. Please review the current master to check that the fix is correct.

> murmur3.hash128() does not account for unsigned seed argument
> -
>
> Key: CODEC-264
> URL: https://issues.apache.org/jira/browse/CODEC-264
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Claude Warren
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.14
>
> Attachments: YonikMurmur3Tests.java
>
>
> The original murmur3_x64_128 code used unsigned int for seed arguments.  
> Using the equivalent bit patterns in the commons codec version does not yield 
> the same results.
> I believe this is because the commons version does not account for sign 
> extension etc.
> Yonic Seeley [~yonik] has explains the issue in his implementation 
> https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java
> He provides a test case to show that his code returns the same answers as the 
> original C/C++ code.  I modified that test to call the codec version to show 
> the error.
> I have attached that test case.
> Given that the original code is in the wild I am uncertain how to fix this 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CODEC-264) murmur3.hash128() does not account for unsigned seed argument

2020-01-17 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018374#comment-17018374
 ] 

Andy Seaborne commented on CODEC-264:
-

The v1.14 version of {{hash128(byte[], , int seed)}} does now apply the seed 
mask, contrary to the comments.

Line 805
{noformat}
@Deprecated
public static long[] hash128(final byte[] data, final int offset, final int 
length, final int seed) {
// 
// Note: This fails to apply masking using 0xL to the seed.
// 
return hash128x64(data, offset, length, seed);
}
{noformat}

It calls {{hash128x86(byte[],, int seed)}} (exact signature match), not 
{hash128x86(byte[],, long seed)}} (type conversion).

{{hash128x86(byte[],, int seed)}} applies the mask (checked by debugger walk 
through in EclipseIDE).

{{hash128(byte[],, int seed)}} should be a call of {{hash128x86(byte[],, 
long)}} directly.

I think that casting at the call site will do that:
{noformat}
return hash128x64(data, offset, length, (long)seed);
{noformat}
or for clarity explicitly:
{noformat}
long seedLong = seed; /* unmasked 32->64 bit extension */
return hash128x64(data, offset, length, seedLong);
{noformat}

If the private static work function had a different name, then automatic, 
unmasked conversion would have applied.



> murmur3.hash128() does not account for unsigned seed argument
> -
>
> Key: CODEC-264
> URL: https://issues.apache.org/jira/browse/CODEC-264
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Claude Warren
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.14
>
> Attachments: YonikMurmur3Tests.java
>
>
> The original murmur3_x64_128 code used unsigned int for seed arguments.  
> Using the equivalent bit patterns in the commons codec version does not yield 
> the same results.
> I believe this is because the commons version does not account for sign 
> extension etc.
> Yonic Seeley [~yonik] has explains the issue in his implementation 
> https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java
> He provides a test case to show that his code returns the same answers as the 
> original C/C++ code.  I modified that test to call the codec version to show 
> the error.
> I have attached that test case.
> Given that the original code is in the wild I am uncertain how to fix this 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LANG-1469) Error caused by java.lang.ArrayStoreException org.apache.commons.lang3.text.translate.NumericEntityUnescaper cannot be stored in an array of type o.a.a.a.c.a.b[]

2020-01-17 Thread Ankit Patil (Jira)


[ 
https://issues.apache.org/jira/browse/LANG-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018372#comment-17018372
 ] 

Ankit Patil commented on LANG-1469:
---

[~doniw] This class is deprecated. 

Please use commons-text
[https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html]

> Error caused by java.lang.ArrayStoreException 
> org.apache.commons.lang3.text.translate.NumericEntityUnescaper cannot be 
> stored in an array of type o.a.a.a.c.a.b[]
> -
>
> Key: LANG-1469
> URL: https://issues.apache.org/jira/browse/LANG-1469
> Project: Commons Lang
>  Issue Type: Bug
>  Components: lang.text.*
>Affects Versions: 3.5
> Environment: Android /Java
>Reporter: doni
>Priority: Major
>
> Hi we got error 
> Caused by java.lang.ArrayStoreException
> org.apache.commons.lang3.text.translate.NumericEntityUnescaper cannot be 
> stored in an array of type o.a.a.a.c.a.b[]
> This probably related to proguard on our android project. Do you may have 
> clue what might causing this error ? it is happened after we remove keep 
> proguard rules on apache.commons. 
> This also only happen on xiaomi phone with OS 5 and 6.
> We got this error after calling StringEscapeUtils.escapeHtml4(). Any help 
> would be appreciated.
> Regards.
> Doni
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JEXL-320) "mvn test" fails with COMPILATION ERROR in SynchronizedArithmetic.java on Java 11

2020-01-17 Thread Henri Biestro (Jira)


[ 
https://issues.apache.org/jira/browse/JEXL-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018140#comment-17018140
 ] 

Henri Biestro commented on JEXL-320:


Changeset: de8eb7d2897ebcbfa7d4ff61ed5fce5fa42e20f7
Author:henrib 
Date:  2020-01-17 16:45
Message:   JEXL-320: remove dependency on Unsafe in test

> "mvn test" fails with COMPILATION ERROR in SynchronizedArithmetic.java on 
> Java 11
> -
>
> Key: JEXL-320
> URL: https://issues.apache.org/jira/browse/JEXL-320
> Project: Commons JEXL
>  Issue Type: Bug
> Environment: JDK: OpenJDK 11 (hotspot)
> OS: Ubuntu 18.04.3 LTS
> Apache Maven 3.3.9
>Reporter: David Costanzo
>Priority: Minor
>
> Running "mvn test" when using OpenJDK's Java 11 fails with the following 
> errors:
> {noformat}
> [WARNING] COMPILATION WARNING : 
> [INFO] -
> [WARNING] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[24,16]
>  sun.misc.Unsafe is internal proprietary API and may be removed in a future 
> release
> [WARNING] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[97,20]
>  sun.misc.Unsafe is internal proprietary API and may be removed in a future 
> release
> [WARNING] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[100,23]
>  sun.misc.Unsafe is internal proprietary API and may be removed in a future 
> release
> [WARNING] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[102,23]
>  sun.misc.Unsafe is internal proprietary API and may be removed in a future 
> release
> [INFO] 4 warnings 
> [INFO] -
> [INFO] -
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[63,19]
>  cannot find symbol
>   symbol:   method monitorEnter(java.lang.Object)
>   location: variable UNSAFE of type sun.misc.Unsafe
> [ERROR] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[72,19]
>  cannot find symbol
>   symbol:   method monitorExit(java.lang.Object)
>   location: variable UNSAFE of type sun.misc.Unsafe
> [ERROR] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[113,19]
>  cannot find symbol
>   symbol:   method monitorEnter(java.lang.Object)
>   location: variable UNSAFE of type sun.misc.Unsafe
> [ERROR] 
> /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[118,19]
>  cannot find symbol
>   symbol:   method monitorExit(java.lang.Object)
>   location: variable UNSAFE of type sun.misc.Unsafe
> {noformat}
> BUILDING.txt states that JEXL "requires Java 6 (or later)".  I assume that 
> you expect "mvn test" to work with Java 11.  If not, then this is really a 
> doc bug in BUILDING.txt–it should say that it "requires Java 6 (exactly)" or 
> include the range of supported Java versions.
>  
> *Impact*
> This is a small barrier to entry for a new contributor.  There is an obvious 
> and straight-forward way to get past the problem, which is to download a 
> compatible version of Java and set JAVA_HOME accordingly.  I used Java 1.8, 
> which already had installed on my dev machine.  This prints the same 
> warnings, but no errors.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (JEXL-321) Empty do-while loop is broken

2020-01-17 Thread Henri Biestro (Jira)


 [ 
https://issues.apache.org/jira/browse/JEXL-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henri Biestro resolved JEXL-321.

Resolution: Fixed

Changeset: a70b3d8a75d5805a6daeedcf80ff44bf8b8cb276
Author:henrib 
Date:  2020-01-17 16:43
Message:   JEXL-321: do/while with empty statement contributed fix

> Empty do-while loop is broken
> -
>
> Key: JEXL-321
> URL: https://issues.apache.org/jira/browse/JEXL-321
> Project: Commons JEXL
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: Dmitri Blinov
>Priority: Major
>
> The following test case with AIOOB.
> {code:java}
> @Test
> public void testEmptyBody() throws Exception {
> JexlScript e = JEXL.createScript("var i = 0; do ; while((i+=1) < 10); 
> i");
> JexlContext jc = new MapContext();
> Object o = e.execute(jc);
> Assert.assertEquals(10, o);   
> } {code}
> The suggestion is to change interpreter as follows
> {code}
> @Override
> protected Object visit(ASTDoWhileStatement node, Object data) {
> Object result = null;
> /* last objectNode is the expression */
> Node expressionNode = node.jjtGetChild(node.jjtGetNumChildren()-1);
> do {
> cancelCheck(node);
> if (node.jjtGetNumChildren() > 1) {
> try {
> // execute statement
> result = node.jjtGetChild(0).jjtAccept(this, data);
> } catch (JexlException.Break stmtBreak) {
> break;
> } catch (JexlException.Continue stmtContinue) {
> //continue;
> }
> }
> } while (arithmetic.toBoolean(expressionNode.jjtAccept(this, data)));
> return result;
> }
> {code} and Debugger as follows
> {code}
> @Override
> protected Object visit(ASTDoWhileStatement node, Object data) {
> int num = node.jjtGetNumChildren();
> builder.append("do ");
> if (num > 1) {
> acceptStatement(node.jjtGetChild(0), data);
> } else {
> builder.append(" ; ");
> }
> builder.append(" while (");
> accept(node.jjtGetChild(num - 1), data);
> builder.append(")");
> return data;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats

2020-01-17 Thread Jakob Sultan Ericsson (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018043#comment-17018043
 ] 

Jakob Sultan Ericsson commented on COMPRESS-501:


Some of my commented code were left intentional to understand what is actually 
taking time in the code and start a discussion as we have done. :-)

Some other thoughts that I also experienced when I did this is that some parts 
such as parsing the actual date time can be somewhat time consuming. Maybe just 
saving the raw value (dos timestamp) and then later when/if you actually call 
getTime(), parse it to a correct milliseconds timestamp.

If I uncomment below rows, my naive test goes from 2s to about 3.9s.
{code:java}
long ts = ZipLong.getValue(cfhBuf, off);
final long time = ZipUtil.dosToJavaTime(ts);
ze.setTime(time);
{code}

I have also commented out reading zip64 extra information because we don't need 
this in our use case. I believe that this is might be a compatibility issue for 
general usage of commons-compress. But if I'm not mistaken disabling this 
speeds up reading.

> Possibility to introduce a fast Zip open with some caveats
> --
>
> Key: COMPRESS-501
> URL: https://issues.apache.org/jira/browse/COMPRESS-501
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.19
> Environment: OSX 10.14.6 and Linux
>Reporter: Jakob Sultan Ericsson
>Priority: Major
> Attachments: zipfile-speed-improvements.diff
>
>
> About a year ago I created an improvement 
> (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things 
> in commons-compress for Zip-files. This helped us quite a lot but we wanted 
> it to be even faster so I optimised away some stuff that I thought was not 
> that important for us.
> I was able to improve opening of a 34GB zip file from ~12s to ~2s.
> Now to my question, do you think it would be possible to introduce some of my 
> fixes (diff included) into master?
> Yes, I know that I shortcut some features for some specific zip files and 
> don't expose everything anymore.
> I haven't really made a good switchable solution for it because we just use 
> our own build locally with this path.
> But with some hints from you I might be able to do it somehow. I'm happy to 
> help and would love to get this speed open into master (it is always 
> cumbersome with custom changes to public libraries). 
> {code:java}
> diff --git 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
>  
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> index 767f615d..d441b12d 100644
> --- 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> +++ 
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> @@ -146,6 +146,7 @@
>  private boolean isStreamContiguous = false;
>  private NameSource nameSource = NameSource.NAME;
>  private CommentSource commentSource = CommentSource.COMMENT;
> +private byte[] cdExtraData = null;
>  
>  
>  /**
> @@ -397,6 +398,14 @@ public void setAlignment(int alignment) {
>  this.alignment = alignment;
>  }
>  
> +public void setRawCentralDirectoryExtra(byte[] cdExtraData) {
> +this.cdExtraData = cdExtraData;
> +}
> +
> +public byte[] getRawCentralDirectoryExtra() {
> +return this.cdExtraData;
> +}
> +
>  /**
>   * Replaces all currently attached extra fields with the new array.
>   * @param fields an array of extra fields
> diff --git 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java 
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> index 152272b5..bb33b50f 100644
> --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> @@ -691,10 +691,10 @@ protected void finalize() throws Throwable {
>  final HashMap noUTF8Flag =
>  new HashMap<>();
>  
> -positionAtCentralDirectory();
> +ByteBuffer ceDir = positionAtCentralDirectory();
>  
>  wordBbuf.rewind();
> -IOUtils.readFully(archive, wordBbuf);
> +ceDir.get(wordBuf);
>  long sig = ZipLong.getValue(wordBuf);
>  
>  if (sig != CFH_SIG && startsWithLocalFileHeader()) {
> @@ -703,9 +703,12 @@ protected void finalize() throws Throwable {
>  }
>  
>  while (sig == CFH_SIG) {
> -readCentralDirectoryEntry(noUTF8Flag);
> +readCentralDirectoryEntry(ceDir, noUTF8Flag);
>  wordBbuf.rewind();
> -IOUtils.readFully(archive, wordBbuf);
> +if (ceDir.remaining() == 0) {
> +  

[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer

2020-01-17 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017961#comment-17017961
 ] 

Gilles Sadowski commented on MATH-1509:
---

Thanks for your interest in contributing.

A few comment about the PR:
 * {{ClusterUtils}} defines utilities that are seemingly redundant with those 
in ["Commons 
RNG"|http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.3/org/apache/commons/rng/sampling/ListSampler.html].
 * Why are there _protected_ methods?
 * All fields and methods (including _private_ ones) must have a Javadoc 
comment.
 * Comments should be in English. ;)

> Implement the MiniBatchKMeansClusterer
> --
>
> Key: MATH-1509
> URL: https://issues.apache.org/jira/browse/MATH-1509
> Project: Commons Math
>  Issue Type: New Feature
>Reporter: Chen Tao
>Priority: Major
> Attachments: compare.png
>
>
> MiniBatchKMeans is a fast clustering algorithm, 
> which use partial points in initialize cluster centers, and mini batch in 
> training iterations.
>  It can finish in few seconds on clustering millions of data, and has few 
> differences between KMeans.
> I have implemented it by Kotlin in my own project, and I'd like to contribute 
> the code  to Apache Commons Math, of course in java.
> My implemention is base on Apache Commons Math3, refer to Python 
> sklearn.cluster.MiniBatchKMeans
> Thought test I found it works well on intensive data, significant performance 
> improvement and return value has few difference to KMeans++, but has many 
> difference on sparse data.
>  
> Below is the comparation of my implemention and KMeansPlusPlusClusterer
>   !compare.png!
>  
> I have created a pull request on 
> [https://github.com/apache/commons-math/pull/117], for reference only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer

2020-01-17 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017951#comment-17017951
 ] 

Gilles Sadowski commented on MATH-1509:
---

{quote}workflow
{quote}
For new features, the starting point would be to describe the proposal on the 
"dev" ML.
 Once the idea is accepted, a JIRA report is created (this is done already ;)) 
in order to discuss practical details of the implementations (like improvements 
to a PR).

> Implement the MiniBatchKMeansClusterer
> --
>
> Key: MATH-1509
> URL: https://issues.apache.org/jira/browse/MATH-1509
> Project: Commons Math
>  Issue Type: New Feature
>Reporter: Chen Tao
>Priority: Major
> Attachments: compare.png
>
>
> MiniBatchKMeans is a fast clustering algorithm, 
> which use partial points in initialize cluster centers, and mini batch in 
> training iterations.
>  It can finish in few seconds on clustering millions of data, and has few 
> differences between KMeans.
> I have implemented it by Kotlin in my own project, and I'd like to contribute 
> the code  to Apache Commons Math, of course in java.
> My implemention is base on Apache Commons Math3, refer to Python 
> sklearn.cluster.MiniBatchKMeans
> Thought test I found it works well on intensive data, significant performance 
> improvement and return value has few difference to KMeans++, but has many 
> difference on sparse data.
>  
> Below is the comparation of my implemention and KMeansPlusPlusClusterer
>   !compare.png!
>  
> I have created a pull request on 
> [https://github.com/apache/commons-math/pull/117], for reference only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (JEXL-321) Empty do-while loop is broken

2020-01-17 Thread Dmitri Blinov (Jira)
Dmitri Blinov created JEXL-321:
--

 Summary: Empty do-while loop is broken
 Key: JEXL-321
 URL: https://issues.apache.org/jira/browse/JEXL-321
 Project: Commons JEXL
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Dmitri Blinov


The following test case with AIOOB.
{code:java}
@Test
public void testEmptyBody() throws Exception {
JexlScript e = JEXL.createScript("var i = 0; do ; while((i+=1) < 10); 
i");
JexlContext jc = new MapContext();
Object o = e.execute(jc);
Assert.assertEquals(10, o);   
} {code}
The suggestion is to change interpreter as follows
{code}
@Override
protected Object visit(ASTDoWhileStatement node, Object data) {
Object result = null;
/* last objectNode is the expression */
Node expressionNode = node.jjtGetChild(node.jjtGetNumChildren()-1);
do {
cancelCheck(node);
if (node.jjtGetNumChildren() > 1) {
try {
// execute statement
result = node.jjtGetChild(0).jjtAccept(this, data);
} catch (JexlException.Break stmtBreak) {
break;
} catch (JexlException.Continue stmtContinue) {
//continue;
}
}
} while (arithmetic.toBoolean(expressionNode.jjtAccept(this, data)));
return result;
}
{code} and Debugger as follows
{code}
@Override
protected Object visit(ASTDoWhileStatement node, Object data) {
int num = node.jjtGetNumChildren();
builder.append("do ");
if (num > 1) {
acceptStatement(node.jjtGetChild(0), data);
} else {
builder.append(" ; ");
}
builder.append(" while (");
accept(node.jjtGetChild(num - 1), data);
builder.append(")");
return data;
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test

2020-01-17 Thread Gilles Sadowski (Jira)


 [ 
https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilles Sadowski resolved MATH-1487.
---
Resolution: Incomplete

Closing (no feedback from the OP in more than 6 months).

> MathInternalError - Kolmogorov Smirnov Test
> ---
>
> Key: MATH-1487
> URL: https://issues.apache.org/jira/browse/MATH-1487
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Paweł Lipiński
>Priority: Critical
> Attachments: alpha.arr, beta.arr
>
>
> Hi,
> I spotted a pesky bug in KolmogorovSmirnovTest class, in the method 
> kolmogorovSmirnovTest.
> In order to reproduce the error use arrays from attachments.
> Stacktrace:
> {noformat}
> org.apache.commons.math3.exception.MathInternalError: illegal state: internal 
> error, please fill a bug report at https://issues.apache.org/jira/browse/MATH
> at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290)
> {noformat}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TEXT-176) Release Patch 1.8.1

2020-01-17 Thread Furkan KILIC (Jira)
Furkan KILIC created TEXT-176:
-

 Summary: Release Patch 1.8.1
 Key: TEXT-176
 URL: https://issues.apache.org/jira/browse/TEXT-176
 Project: Commons Text
  Issue Type: Wish
Reporter: Furkan KILIC


Hello

Is it possible to release the patch 1.8.1 as the last release is from september 
2019 and some features/bugfix have been merged since.

Thanks a lot.

Best regards

Furkilic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test

2020-01-17 Thread Chen Tao (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017882#comment-17017882
 ] 

Chen Tao edited comment on MATH-1487 at 1/17/20 10:23 AM:
--

I can not reproduce this bug both in 3.6.1 and development version, by this 
code:
{code:java}
@Testpublic void testCase() throws IOException
{ double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = 
readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = 
new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, 
beta); }
private double[] readToDoubleArray(final String filename) throws IOException
{ return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) 
.stream() .mapToDouble(Double::parseDouble) .toArray(); }
{code}
More information should be provide.


was (Author: chentao106):
I can not reproduce this bug both in 3.6.1 and development version, by this 
code:

```java
 @Testpublic void testCase() throws IOException

{ double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = 
readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = 
new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, 
beta); }

private double[] readToDoubleArray(final String filename) throws IOException

{ return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) 
.stream() .mapToDouble(Double::parseDouble) .toArray(); }

``` 

More information should be provide.

> MathInternalError - Kolmogorov Smirnov Test
> ---
>
> Key: MATH-1487
> URL: https://issues.apache.org/jira/browse/MATH-1487
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Paweł Lipiński
>Priority: Critical
> Attachments: alpha.arr, beta.arr
>
>
> Hi,
> I spotted a pesky bug in KolmogorovSmirnovTest class, in the method 
> kolmogorovSmirnovTest.
> In order to reproduce the error use arrays from attachments.
> Stacktrace:
> {noformat}
> org.apache.commons.math3.exception.MathInternalError: illegal state: internal 
> error, please fill a bug report at https://issues.apache.org/jira/browse/MATH
> at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290)
> {noformat}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test

2020-01-17 Thread Chen Tao (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017882#comment-17017882
 ] 

Chen Tao edited comment on MATH-1487 at 1/17/20 10:22 AM:
--

I can not reproduce this bug both in 3.6.1 and development version, by this 
code:

```java
 @Testpublic void testCase() throws IOException

{ double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = 
readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = 
new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, 
beta); }

private double[] readToDoubleArray(final String filename) throws IOException

{ return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) 
.stream() .mapToDouble(Double::parseDouble) .toArray(); }

``` 

More information should be provide.


was (Author: chentao106):
I can not reproduce this bug both in 3.6.1 and development version, by this 
code:
@Testpublic void testCase() throws IOException { double[] alpha = 
readToDoubleArray("alpha.arr"); double[] beta = readToDoubleArray("beta.arr");

 KolmogorovSmirnovTest kolmogorovSmirnovTest = new KolmogorovSmirnovTest();
 kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, beta);
}private double[] readToDoubleArray(final String filename) throws IOException { 
return Files.readAllLines(Paths.get("path", "to", "arrays", filename))
 .stream()
 .mapToDouble(Double::parseDouble)
 .toArray();
}
 

More information should be provide.

> MathInternalError - Kolmogorov Smirnov Test
> ---
>
> Key: MATH-1487
> URL: https://issues.apache.org/jira/browse/MATH-1487
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Paweł Lipiński
>Priority: Critical
> Attachments: alpha.arr, beta.arr
>
>
> Hi,
> I spotted a pesky bug in KolmogorovSmirnovTest class, in the method 
> kolmogorovSmirnovTest.
> In order to reproduce the error use arrays from attachments.
> Stacktrace:
> {noformat}
> org.apache.commons.math3.exception.MathInternalError: illegal state: internal 
> error, please fill a bug report at https://issues.apache.org/jira/browse/MATH
> at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290)
> {noformat}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test

2020-01-17 Thread Chen Tao (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017882#comment-17017882
 ] 

Chen Tao commented on MATH-1487:


I can not reproduce this bug both in 3.6.1 and development version, by this 
code:
@Testpublic void testCase() throws IOException { double[] alpha = 
readToDoubleArray("alpha.arr"); double[] beta = readToDoubleArray("beta.arr");

 KolmogorovSmirnovTest kolmogorovSmirnovTest = new KolmogorovSmirnovTest();
 kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, beta);
}private double[] readToDoubleArray(final String filename) throws IOException { 
return Files.readAllLines(Paths.get("path", "to", "arrays", filename))
 .stream()
 .mapToDouble(Double::parseDouble)
 .toArray();
}
 

More information should be provide.

> MathInternalError - Kolmogorov Smirnov Test
> ---
>
> Key: MATH-1487
> URL: https://issues.apache.org/jira/browse/MATH-1487
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Paweł Lipiński
>Priority: Critical
> Attachments: alpha.arr, beta.arr
>
>
> Hi,
> I spotted a pesky bug in KolmogorovSmirnovTest class, in the method 
> kolmogorovSmirnovTest.
> In order to reproduce the error use arrays from attachments.
> Stacktrace:
> {noformat}
> org.apache.commons.math3.exception.MathInternalError: illegal state: internal 
> error, please fill a bug report at https://issues.apache.org/jira/browse/MATH
> at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263)
>  at 
> org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290)
> {noformat}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMAGING-247) crash on reading tiff image

2020-01-17 Thread Robin Morier (Jira)


[ 
https://issues.apache.org/jira/browse/IMAGING-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017866#comment-17017866
 ] 

Robin Morier commented on IMAGING-247:
--

Thanks for investigating.
As a temporary workaround I'm converting the TIFFs to white_is_zero scheme (so, 
without the palette).
I've spent some time trying to understand the library code that leads to that 
faulty 255 sample value but couldn't quite get my mind around the meaning of 
the bit shifts etc...

> crash on reading tiff image
> ---
>
> Key: IMAGING-247
> URL: https://issues.apache.org/jira/browse/IMAGING-247
> Project: Commons Imaging
>  Issue Type: Bug
>  Components: Format: TIFF
>Affects Versions: 1.0-alpha1
>Reporter: Robin Morier
>Priority: Major
> Attachments: neutre.TIFF
>
>
> I get an index out of bounds exception trying to load the attached image.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: Index 255 out of bounds for length 2
>   at 
> org.apache.commons.imaging.formats.tiff.photometricinterpreters.PhotometricInterpreterPalette.interpretPixel(PhotometricInterpreterPalette.java:53)
>   at 
> org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.interpretStrip(DataReaderStrips.java:179)
>   at 
> org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.readImageData(DataReaderStrips.java:212)
>   at 
> org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:659)
>   at 
> org.apache.commons.imaging.formats.tiff.TiffDirectory.getTiffImage(TiffDirectory.java:163)
>   at 
> org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:469)
>   at 
> org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1442)
>   at 
> org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1404){noformat}
>  
> I'm calling getBufferedImage without any parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer

2020-01-17 Thread Chen Tao (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017849#comment-17017849
 ] 

Chen Tao commented on MATH-1509:


"For reference only" means I will recreate a pull request after discuss, and I 
familiar with the work flow about this project.

> Implement the MiniBatchKMeansClusterer
> --
>
> Key: MATH-1509
> URL: https://issues.apache.org/jira/browse/MATH-1509
> Project: Commons Math
>  Issue Type: New Feature
>Reporter: Chen Tao
>Priority: Major
> Attachments: compare.png
>
>
> MiniBatchKMeans is a fast clustering algorithm, 
> which use partial points in initialize cluster centers, and mini batch in 
> training iterations.
>  It can finish in few seconds on clustering millions of data, and has few 
> differences between KMeans.
> I have implemented it by Kotlin in my own project, and I'd like to contribute 
> the code  to Apache Commons Math, of course in java.
> My implemention is base on Apache Commons Math3, refer to Python 
> sklearn.cluster.MiniBatchKMeans
> Thought test I found it works well on intensive data, significant performance 
> improvement and return value has few difference to KMeans++, but has many 
> difference on sparse data.
>  
> Below is the comparation of my implemention and KMeansPlusPlusClusterer
>   !compare.png!
>  
> I have created a pull request on 
> [https://github.com/apache/commons-math/pull/117], for reference only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer

2020-01-17 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017828#comment-17017828
 ] 

Gilles Sadowski commented on MATH-1509:
---

bq. I'd like to contribute the code  to Apache Commons Math

Thanks, and welcome.

bq. I have created a pull request [...] for reference only.

What do you mean by "for reference only"?

> Implement the MiniBatchKMeansClusterer
> --
>
> Key: MATH-1509
> URL: https://issues.apache.org/jira/browse/MATH-1509
> Project: Commons Math
>  Issue Type: New Feature
>Reporter: Chen Tao
>Priority: Major
> Attachments: compare.png
>
>
> MiniBatchKMeans is a fast clustering algorithm, 
> which use partial points in initialize cluster centers, and mini batch in 
> training iterations.
>  It can finish in few seconds on clustering millions of data, and has few 
> differences between KMeans.
> I have implemented it by Kotlin in my own project, and I'd like to contribute 
> the code  to Apache Commons Math, of course in java.
> My implemention is base on Apache Commons Math3, refer to Python 
> sklearn.cluster.MiniBatchKMeans
> Thought test I found it works well on intensive data, significant performance 
> improvement and return value has few difference to KMeans++, but has many 
> difference on sparse data.
>  
> Below is the comparation of my implemention and KMeansPlusPlusClusterer
>   !compare.png!
>  
> I have created a pull request on 
> [https://github.com/apache/commons-math/pull/117], for reference only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CLI-302) More user-friendly error handling for missing required arguments

2020-01-17 Thread rkrisztian (Jira)
rkrisztian created CLI-302:
--

 Summary: More user-friendly error handling for missing required 
arguments
 Key: CLI-302
 URL: https://issues.apache.org/jira/browse/CLI-302
 Project: Commons CLI
  Issue Type: Bug
  Components: CLI-1.x
Affects Versions: 1.4
Reporter: rkrisztian


Currently when I specify a flag that requires an argument, but I actually don't 
specify that argument, I get the usage plus an exception. It would be nicer for 
the user if the exception did not happen:

{noformat}
$ myCliApp -a
error: Missing argument for option: a
usage: [options]
Options:
 -a,--argumentspecify this argument
Exception in thread "main" java.lang.NullPointerException: Cannot invoke method 
hasOption() on null object
at 
org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at 
org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127)
at 
groovy.cli.commons.CliBuilder.processSetAnnotation(CliBuilder.groovy:561)
{noformat}

And I cannot control this because I just call:

{code:none}
cli.parseFromInstance options, args
{code}

Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats

2020-01-17 Thread Peter Alfred Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1701#comment-1701
 ] 

Peter Alfred Lee edited comment on COMPRESS-501 at 1/17/20 8:26 AM:


For this patch, the whole Central Directory is read within a single file read, 
and no doubt it could save some time. Currently, reading a Central Directory 
Header needs 4 file reads :

(1) reading the size-fixed part of Central Directory Header;

(2) reading the file name with variable size;

(3) reading the central directory extra data with variable size;

(4) reading the comment with variable size;

(I think we can at least combine the 2, 3 and 4 into a single file read.)

This means that we need to have N * 4 file reads when opening a zip archive 
with N central directories. With your patch, we can do all these within a 
single read. I think this is why you can make it from ~12s to ~2s(N * 4 file 
reads -> 1 file read). I think this is a trade off between memory and time.

But we should care about the use of memory. By reading all the Central 
Directory into a buffer, this may take a lot of memory space. A Central 
Directory Header could have 46(size-fixed part) + 65536(file name) + 
65536(extra data) + 65536(comment) = 196,654 bytes = ~192 kb. Basing on the zip 
specification, the size of the central directory could be 4,294,967,295‬ bytes 
= 4Gb (0x  in zip64) at most.

If a potential attacker is planning a DNS attack to this, it might not be a 
hard case - just make a zip with many large Central Directory Headers. So I'm 
wondering if we need to set a threshold value for this? Using a buffer with 
proper size, we can read as more Central Directory Headers as possible, and 
don't take too much use of the memory.


was (Author: peter alfred lee):
For this patch, the whole Central Directory is read within a single file read, 
and no doubt it could save some time. Currently, reading a Central Directory 
Header needs 4 file reads :

(1) reading the size-fixed part of Central Directory Header;

(2) reading the file name with variable size;

(3) reading the central directory extra data with variable size;

(4) reading the comment with variable size;

(I think we can at least combine the 2, 3 and 4 into a single file read.)

This means that we need to have N * 4 file reads when opening a zip archive 
with N central directories. With your patch, we can do all these within a 
single read. I think this is why you can make it from ~12s to ~2s(N * 4 file 
reads -> 1 file read). I think this is a trade off between memory and time.

But we should care about the use of memory. By reading all the Central 
Directory into a buffer, this may take a lot of memory space. A Central 
Directory Header could have 46(size-fixed part) + 65536(file name) + 
65536(extra data) + 65536(comment) = 196,654 bytes = ~192 kb. Basing on the zip 
specification, the size of the central directory could be 4,294,967,295‬ bytes 
= 4Gb (0x  in zip64) at most.

If a potential attacker is planning a DNS attack to Apache Commons-Compress, it 
might not be a hard case - just make a zip with many large Central Directory 
Headers. So I'm wondering if we need to set a threshold value for this? Using a 
buffer with proper size, we can read as more Central Directory Headers as 
possible, and don't take too much use of the memory.

> Possibility to introduce a fast Zip open with some caveats
> --
>
> Key: COMPRESS-501
> URL: https://issues.apache.org/jira/browse/COMPRESS-501
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.19
> Environment: OSX 10.14.6 and Linux
>Reporter: Jakob Sultan Ericsson
>Priority: Major
> Attachments: zipfile-speed-improvements.diff
>
>
> About a year ago I created an improvement 
> (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things 
> in commons-compress for Zip-files. This helped us quite a lot but we wanted 
> it to be even faster so I optimised away some stuff that I thought was not 
> that important for us.
> I was able to improve opening of a 34GB zip file from ~12s to ~2s.
> Now to my question, do you think it would be possible to introduce some of my 
> fixes (diff included) into master?
> Yes, I know that I shortcut some features for some specific zip files and 
> don't expose everything anymore.
> I haven't really made a good switchable solution for it because we just use 
> our own build locally with this path.
> But with some hints from you I might be able to do it somehow. I'm happy to 
> help and would love to get this speed open into master (it is always 
> cumbersome with custom changes to public libraries). 
> {code:java}
> diff --git 
>