[GitHub] [commons-math] chentao106 closed pull request #117: Implement the MiniBatchKMeansClusterer
chentao106 closed pull request #117: Implement the MiniBatchKMeansClusterer URL: https://github.com/apache/commons-math/pull/117 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (CODEC-264) murmur3.hash128() does not account for unsigned seed argument
[ https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018468#comment-17018468 ] Alex Herbert commented on CODEC-264: Thanks for the raising this. The effect is that despite creating a new method for the fixed version to maintain behavioural compatibility with the old broken version the code actually fixes the old version and breaks behavioural compatibility. I have added a test to maintain behavioural compatibility and fixed the code as suggested. Please review the current master to check that the fix is correct. > murmur3.hash128() does not account for unsigned seed argument > - > > Key: CODEC-264 > URL: https://issues.apache.org/jira/browse/CODEC-264 > Project: Commons Codec > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Claude Warren >Assignee: Alex Herbert >Priority: Major > Fix For: 1.14 > > Attachments: YonikMurmur3Tests.java > > > The original murmur3_x64_128 code used unsigned int for seed arguments. > Using the equivalent bit patterns in the commons codec version does not yield > the same results. > I believe this is because the commons version does not account for sign > extension etc. > Yonic Seeley [~yonik] has explains the issue in his implementation > https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java > He provides a test case to show that his code returns the same answers as the > original C/C++ code. I modified that test to call the codec version to show > the error. > I have attached that test case. > Given that the original code is in the wild I am uncertain how to fix this > issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CODEC-264) murmur3.hash128() does not account for unsigned seed argument
[ https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018374#comment-17018374 ] Andy Seaborne commented on CODEC-264: - The v1.14 version of {{hash128(byte[], , int seed)}} does now apply the seed mask, contrary to the comments. Line 805 {noformat} @Deprecated public static long[] hash128(final byte[] data, final int offset, final int length, final int seed) { // // Note: This fails to apply masking using 0xL to the seed. // return hash128x64(data, offset, length, seed); } {noformat} It calls {{hash128x86(byte[],, int seed)}} (exact signature match), not {hash128x86(byte[],, long seed)}} (type conversion). {{hash128x86(byte[],, int seed)}} applies the mask (checked by debugger walk through in EclipseIDE). {{hash128(byte[],, int seed)}} should be a call of {{hash128x86(byte[],, long)}} directly. I think that casting at the call site will do that: {noformat} return hash128x64(data, offset, length, (long)seed); {noformat} or for clarity explicitly: {noformat} long seedLong = seed; /* unmasked 32->64 bit extension */ return hash128x64(data, offset, length, seedLong); {noformat} If the private static work function had a different name, then automatic, unmasked conversion would have applied. > murmur3.hash128() does not account for unsigned seed argument > - > > Key: CODEC-264 > URL: https://issues.apache.org/jira/browse/CODEC-264 > Project: Commons Codec > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Claude Warren >Assignee: Alex Herbert >Priority: Major > Fix For: 1.14 > > Attachments: YonikMurmur3Tests.java > > > The original murmur3_x64_128 code used unsigned int for seed arguments. > Using the equivalent bit patterns in the commons codec version does not yield > the same results. > I believe this is because the commons version does not account for sign > extension etc. > Yonic Seeley [~yonik] has explains the issue in his implementation > https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java > He provides a test case to show that his code returns the same answers as the > original C/C++ code. I modified that test to call the codec version to show > the error. > I have attached that test case. > Given that the original code is in the wild I am uncertain how to fix this > issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LANG-1469) Error caused by java.lang.ArrayStoreException org.apache.commons.lang3.text.translate.NumericEntityUnescaper cannot be stored in an array of type o.a.a.a.c.a.b[]
[ https://issues.apache.org/jira/browse/LANG-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018372#comment-17018372 ] Ankit Patil commented on LANG-1469: --- [~doniw] This class is deprecated. Please use commons-text [https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html] > Error caused by java.lang.ArrayStoreException > org.apache.commons.lang3.text.translate.NumericEntityUnescaper cannot be > stored in an array of type o.a.a.a.c.a.b[] > - > > Key: LANG-1469 > URL: https://issues.apache.org/jira/browse/LANG-1469 > Project: Commons Lang > Issue Type: Bug > Components: lang.text.* >Affects Versions: 3.5 > Environment: Android /Java >Reporter: doni >Priority: Major > > Hi we got error > Caused by java.lang.ArrayStoreException > org.apache.commons.lang3.text.translate.NumericEntityUnescaper cannot be > stored in an array of type o.a.a.a.c.a.b[] > This probably related to proguard on our android project. Do you may have > clue what might causing this error ? it is happened after we remove keep > proguard rules on apache.commons. > This also only happen on xiaomi phone with OS 5 and 6. > We got this error after calling StringEscapeUtils.escapeHtml4(). Any help > would be appreciated. > Regards. > Doni > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (JEXL-320) "mvn test" fails with COMPILATION ERROR in SynchronizedArithmetic.java on Java 11
[ https://issues.apache.org/jira/browse/JEXL-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018140#comment-17018140 ] Henri Biestro commented on JEXL-320: Changeset: de8eb7d2897ebcbfa7d4ff61ed5fce5fa42e20f7 Author:henrib Date: 2020-01-17 16:45 Message: JEXL-320: remove dependency on Unsafe in test > "mvn test" fails with COMPILATION ERROR in SynchronizedArithmetic.java on > Java 11 > - > > Key: JEXL-320 > URL: https://issues.apache.org/jira/browse/JEXL-320 > Project: Commons JEXL > Issue Type: Bug > Environment: JDK: OpenJDK 11 (hotspot) > OS: Ubuntu 18.04.3 LTS > Apache Maven 3.3.9 >Reporter: David Costanzo >Priority: Minor > > Running "mvn test" when using OpenJDK's Java 11 fails with the following > errors: > {noformat} > [WARNING] COMPILATION WARNING : > [INFO] - > [WARNING] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[24,16] > sun.misc.Unsafe is internal proprietary API and may be removed in a future > release > [WARNING] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[97,20] > sun.misc.Unsafe is internal proprietary API and may be removed in a future > release > [WARNING] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[100,23] > sun.misc.Unsafe is internal proprietary API and may be removed in a future > release > [WARNING] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[102,23] > sun.misc.Unsafe is internal proprietary API and may be removed in a future > release > [INFO] 4 warnings > [INFO] - > [INFO] - > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[63,19] > cannot find symbol > symbol: method monitorEnter(java.lang.Object) > location: variable UNSAFE of type sun.misc.Unsafe > [ERROR] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[72,19] > cannot find symbol > symbol: method monitorExit(java.lang.Object) > location: variable UNSAFE of type sun.misc.Unsafe > [ERROR] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[113,19] > cannot find symbol > symbol: method monitorEnter(java.lang.Object) > location: variable UNSAFE of type sun.misc.Unsafe > [ERROR] > /local_static/github/commons-jexl/src/test/java/org/apache/commons/jexl3/SynchronizedArithmetic.java:[118,19] > cannot find symbol > symbol: method monitorExit(java.lang.Object) > location: variable UNSAFE of type sun.misc.Unsafe > {noformat} > BUILDING.txt states that JEXL "requires Java 6 (or later)". I assume that > you expect "mvn test" to work with Java 11. If not, then this is really a > doc bug in BUILDING.txt–it should say that it "requires Java 6 (exactly)" or > include the range of supported Java versions. > > *Impact* > This is a small barrier to entry for a new contributor. There is an obvious > and straight-forward way to get past the problem, which is to download a > compatible version of Java and set JAVA_HOME accordingly. I used Java 1.8, > which already had installed on my dev machine. This prints the same > warnings, but no errors. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (JEXL-321) Empty do-while loop is broken
[ https://issues.apache.org/jira/browse/JEXL-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henri Biestro resolved JEXL-321. Resolution: Fixed Changeset: a70b3d8a75d5805a6daeedcf80ff44bf8b8cb276 Author:henrib Date: 2020-01-17 16:43 Message: JEXL-321: do/while with empty statement contributed fix > Empty do-while loop is broken > - > > Key: JEXL-321 > URL: https://issues.apache.org/jira/browse/JEXL-321 > Project: Commons JEXL > Issue Type: Bug >Affects Versions: 3.1 >Reporter: Dmitri Blinov >Priority: Major > > The following test case with AIOOB. > {code:java} > @Test > public void testEmptyBody() throws Exception { > JexlScript e = JEXL.createScript("var i = 0; do ; while((i+=1) < 10); > i"); > JexlContext jc = new MapContext(); > Object o = e.execute(jc); > Assert.assertEquals(10, o); > } {code} > The suggestion is to change interpreter as follows > {code} > @Override > protected Object visit(ASTDoWhileStatement node, Object data) { > Object result = null; > /* last objectNode is the expression */ > Node expressionNode = node.jjtGetChild(node.jjtGetNumChildren()-1); > do { > cancelCheck(node); > if (node.jjtGetNumChildren() > 1) { > try { > // execute statement > result = node.jjtGetChild(0).jjtAccept(this, data); > } catch (JexlException.Break stmtBreak) { > break; > } catch (JexlException.Continue stmtContinue) { > //continue; > } > } > } while (arithmetic.toBoolean(expressionNode.jjtAccept(this, data))); > return result; > } > {code} and Debugger as follows > {code} > @Override > protected Object visit(ASTDoWhileStatement node, Object data) { > int num = node.jjtGetNumChildren(); > builder.append("do "); > if (num > 1) { > acceptStatement(node.jjtGetChild(0), data); > } else { > builder.append(" ; "); > } > builder.append(" while ("); > accept(node.jjtGetChild(num - 1), data); > builder.append(")"); > return data; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats
[ https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018043#comment-17018043 ] Jakob Sultan Ericsson commented on COMPRESS-501: Some of my commented code were left intentional to understand what is actually taking time in the code and start a discussion as we have done. :-) Some other thoughts that I also experienced when I did this is that some parts such as parsing the actual date time can be somewhat time consuming. Maybe just saving the raw value (dos timestamp) and then later when/if you actually call getTime(), parse it to a correct milliseconds timestamp. If I uncomment below rows, my naive test goes from 2s to about 3.9s. {code:java} long ts = ZipLong.getValue(cfhBuf, off); final long time = ZipUtil.dosToJavaTime(ts); ze.setTime(time); {code} I have also commented out reading zip64 extra information because we don't need this in our use case. I believe that this is might be a compatibility issue for general usage of commons-compress. But if I'm not mistaken disabling this speeds up reading. > Possibility to introduce a fast Zip open with some caveats > -- > > Key: COMPRESS-501 > URL: https://issues.apache.org/jira/browse/COMPRESS-501 > Project: Commons Compress > Issue Type: Improvement > Components: Archivers >Affects Versions: 1.19 > Environment: OSX 10.14.6 and Linux >Reporter: Jakob Sultan Ericsson >Priority: Major > Attachments: zipfile-speed-improvements.diff > > > About a year ago I created an improvement > (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things > in commons-compress for Zip-files. This helped us quite a lot but we wanted > it to be even faster so I optimised away some stuff that I thought was not > that important for us. > I was able to improve opening of a 34GB zip file from ~12s to ~2s. > Now to my question, do you think it would be possible to introduce some of my > fixes (diff included) into master? > Yes, I know that I shortcut some features for some specific zip files and > don't expose everything anymore. > I haven't really made a good switchable solution for it because we just use > our own build locally with this path. > But with some hints from you I might be able to do it somehow. I'm happy to > help and would love to get this speed open into master (it is always > cumbersome with custom changes to public libraries). > {code:java} > diff --git > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > index 767f615d..d441b12d 100644 > --- > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > +++ > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > @@ -146,6 +146,7 @@ > private boolean isStreamContiguous = false; > private NameSource nameSource = NameSource.NAME; > private CommentSource commentSource = CommentSource.COMMENT; > +private byte[] cdExtraData = null; > > > /** > @@ -397,6 +398,14 @@ public void setAlignment(int alignment) { > this.alignment = alignment; > } > > +public void setRawCentralDirectoryExtra(byte[] cdExtraData) { > +this.cdExtraData = cdExtraData; > +} > + > +public byte[] getRawCentralDirectoryExtra() { > +return this.cdExtraData; > +} > + > /** > * Replaces all currently attached extra fields with the new array. > * @param fields an array of extra fields > diff --git > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > index 152272b5..bb33b50f 100644 > --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > @@ -691,10 +691,10 @@ protected void finalize() throws Throwable { > final HashMap noUTF8Flag = > new HashMap<>(); > > -positionAtCentralDirectory(); > +ByteBuffer ceDir = positionAtCentralDirectory(); > > wordBbuf.rewind(); > -IOUtils.readFully(archive, wordBbuf); > +ceDir.get(wordBuf); > long sig = ZipLong.getValue(wordBuf); > > if (sig != CFH_SIG && startsWithLocalFileHeader()) { > @@ -703,9 +703,12 @@ protected void finalize() throws Throwable { > } > > while (sig == CFH_SIG) { > -readCentralDirectoryEntry(noUTF8Flag); > +readCentralDirectoryEntry(ceDir, noUTF8Flag); > wordBbuf.rewind(); > -IOUtils.readFully(archive, wordBbuf); > +if (ceDir.remaining() == 0) { > +
[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer
[ https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017961#comment-17017961 ] Gilles Sadowski commented on MATH-1509: --- Thanks for your interest in contributing. A few comment about the PR: * {{ClusterUtils}} defines utilities that are seemingly redundant with those in ["Commons RNG"|http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.3/org/apache/commons/rng/sampling/ListSampler.html]. * Why are there _protected_ methods? * All fields and methods (including _private_ ones) must have a Javadoc comment. * Comments should be in English. ;) > Implement the MiniBatchKMeansClusterer > -- > > Key: MATH-1509 > URL: https://issues.apache.org/jira/browse/MATH-1509 > Project: Commons Math > Issue Type: New Feature >Reporter: Chen Tao >Priority: Major > Attachments: compare.png > > > MiniBatchKMeans is a fast clustering algorithm, > which use partial points in initialize cluster centers, and mini batch in > training iterations. > It can finish in few seconds on clustering millions of data, and has few > differences between KMeans. > I have implemented it by Kotlin in my own project, and I'd like to contribute > the code to Apache Commons Math, of course in java. > My implemention is base on Apache Commons Math3, refer to Python > sklearn.cluster.MiniBatchKMeans > Thought test I found it works well on intensive data, significant performance > improvement and return value has few difference to KMeans++, but has many > difference on sparse data. > > Below is the comparation of my implemention and KMeansPlusPlusClusterer > !compare.png! > > I have created a pull request on > [https://github.com/apache/commons-math/pull/117], for reference only. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer
[ https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017951#comment-17017951 ] Gilles Sadowski commented on MATH-1509: --- {quote}workflow {quote} For new features, the starting point would be to describe the proposal on the "dev" ML. Once the idea is accepted, a JIRA report is created (this is done already ;)) in order to discuss practical details of the implementations (like improvements to a PR). > Implement the MiniBatchKMeansClusterer > -- > > Key: MATH-1509 > URL: https://issues.apache.org/jira/browse/MATH-1509 > Project: Commons Math > Issue Type: New Feature >Reporter: Chen Tao >Priority: Major > Attachments: compare.png > > > MiniBatchKMeans is a fast clustering algorithm, > which use partial points in initialize cluster centers, and mini batch in > training iterations. > It can finish in few seconds on clustering millions of data, and has few > differences between KMeans. > I have implemented it by Kotlin in my own project, and I'd like to contribute > the code to Apache Commons Math, of course in java. > My implemention is base on Apache Commons Math3, refer to Python > sklearn.cluster.MiniBatchKMeans > Thought test I found it works well on intensive data, significant performance > improvement and return value has few difference to KMeans++, but has many > difference on sparse data. > > Below is the comparation of my implemention and KMeansPlusPlusClusterer > !compare.png! > > I have created a pull request on > [https://github.com/apache/commons-math/pull/117], for reference only. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (JEXL-321) Empty do-while loop is broken
Dmitri Blinov created JEXL-321: -- Summary: Empty do-while loop is broken Key: JEXL-321 URL: https://issues.apache.org/jira/browse/JEXL-321 Project: Commons JEXL Issue Type: Bug Affects Versions: 3.1 Reporter: Dmitri Blinov The following test case with AIOOB. {code:java} @Test public void testEmptyBody() throws Exception { JexlScript e = JEXL.createScript("var i = 0; do ; while((i+=1) < 10); i"); JexlContext jc = new MapContext(); Object o = e.execute(jc); Assert.assertEquals(10, o); } {code} The suggestion is to change interpreter as follows {code} @Override protected Object visit(ASTDoWhileStatement node, Object data) { Object result = null; /* last objectNode is the expression */ Node expressionNode = node.jjtGetChild(node.jjtGetNumChildren()-1); do { cancelCheck(node); if (node.jjtGetNumChildren() > 1) { try { // execute statement result = node.jjtGetChild(0).jjtAccept(this, data); } catch (JexlException.Break stmtBreak) { break; } catch (JexlException.Continue stmtContinue) { //continue; } } } while (arithmetic.toBoolean(expressionNode.jjtAccept(this, data))); return result; } {code} and Debugger as follows {code} @Override protected Object visit(ASTDoWhileStatement node, Object data) { int num = node.jjtGetNumChildren(); builder.append("do "); if (num > 1) { acceptStatement(node.jjtGetChild(0), data); } else { builder.append(" ; "); } builder.append(" while ("); accept(node.jjtGetChild(num - 1), data); builder.append(")"); return data; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test
[ https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilles Sadowski resolved MATH-1487. --- Resolution: Incomplete Closing (no feedback from the OP in more than 6 months). > MathInternalError - Kolmogorov Smirnov Test > --- > > Key: MATH-1487 > URL: https://issues.apache.org/jira/browse/MATH-1487 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.6.1 >Reporter: Paweł Lipiński >Priority: Critical > Attachments: alpha.arr, beta.arr > > > Hi, > I spotted a pesky bug in KolmogorovSmirnovTest class, in the method > kolmogorovSmirnovTest. > In order to reproduce the error use arrays from attachments. > Stacktrace: > {noformat} > org.apache.commons.math3.exception.MathInternalError: illegal state: internal > error, please fill a bug report at https://issues.apache.org/jira/browse/MATH > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290) > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (TEXT-176) Release Patch 1.8.1
Furkan KILIC created TEXT-176: - Summary: Release Patch 1.8.1 Key: TEXT-176 URL: https://issues.apache.org/jira/browse/TEXT-176 Project: Commons Text Issue Type: Wish Reporter: Furkan KILIC Hello Is it possible to release the patch 1.8.1 as the last release is from september 2019 and some features/bugfix have been merged since. Thanks a lot. Best regards Furkilic -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test
[ https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017882#comment-17017882 ] Chen Tao edited comment on MATH-1487 at 1/17/20 10:23 AM: -- I can not reproduce this bug both in 3.6.1 and development version, by this code: {code:java} @Testpublic void testCase() throws IOException { double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, beta); } private double[] readToDoubleArray(final String filename) throws IOException { return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) .stream() .mapToDouble(Double::parseDouble) .toArray(); } {code} More information should be provide. was (Author: chentao106): I can not reproduce this bug both in 3.6.1 and development version, by this code: ```java @Testpublic void testCase() throws IOException { double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, beta); } private double[] readToDoubleArray(final String filename) throws IOException { return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) .stream() .mapToDouble(Double::parseDouble) .toArray(); } ``` More information should be provide. > MathInternalError - Kolmogorov Smirnov Test > --- > > Key: MATH-1487 > URL: https://issues.apache.org/jira/browse/MATH-1487 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.6.1 >Reporter: Paweł Lipiński >Priority: Critical > Attachments: alpha.arr, beta.arr > > > Hi, > I spotted a pesky bug in KolmogorovSmirnovTest class, in the method > kolmogorovSmirnovTest. > In order to reproduce the error use arrays from attachments. > Stacktrace: > {noformat} > org.apache.commons.math3.exception.MathInternalError: illegal state: internal > error, please fill a bug report at https://issues.apache.org/jira/browse/MATH > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290) > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test
[ https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017882#comment-17017882 ] Chen Tao edited comment on MATH-1487 at 1/17/20 10:22 AM: -- I can not reproduce this bug both in 3.6.1 and development version, by this code: ```java @Testpublic void testCase() throws IOException { double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, beta); } private double[] readToDoubleArray(final String filename) throws IOException { return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) .stream() .mapToDouble(Double::parseDouble) .toArray(); } ``` More information should be provide. was (Author: chentao106): I can not reproduce this bug both in 3.6.1 and development version, by this code: @Testpublic void testCase() throws IOException { double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, beta); }private double[] readToDoubleArray(final String filename) throws IOException { return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) .stream() .mapToDouble(Double::parseDouble) .toArray(); } More information should be provide. > MathInternalError - Kolmogorov Smirnov Test > --- > > Key: MATH-1487 > URL: https://issues.apache.org/jira/browse/MATH-1487 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.6.1 >Reporter: Paweł Lipiński >Priority: Critical > Attachments: alpha.arr, beta.arr > > > Hi, > I spotted a pesky bug in KolmogorovSmirnovTest class, in the method > kolmogorovSmirnovTest. > In order to reproduce the error use arrays from attachments. > Stacktrace: > {noformat} > org.apache.commons.math3.exception.MathInternalError: illegal state: internal > error, please fill a bug report at https://issues.apache.org/jira/browse/MATH > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290) > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1487) MathInternalError - Kolmogorov Smirnov Test
[ https://issues.apache.org/jira/browse/MATH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017882#comment-17017882 ] Chen Tao commented on MATH-1487: I can not reproduce this bug both in 3.6.1 and development version, by this code: @Testpublic void testCase() throws IOException { double[] alpha = readToDoubleArray("alpha.arr"); double[] beta = readToDoubleArray("beta.arr"); KolmogorovSmirnovTest kolmogorovSmirnovTest = new KolmogorovSmirnovTest(); kolmogorovSmirnovTest.kolmogorovSmirnovTest(alpha, beta); }private double[] readToDoubleArray(final String filename) throws IOException { return Files.readAllLines(Paths.get("path", "to", "arrays", filename)) .stream() .mapToDouble(Double::parseDouble) .toArray(); } More information should be provide. > MathInternalError - Kolmogorov Smirnov Test > --- > > Key: MATH-1487 > URL: https://issues.apache.org/jira/browse/MATH-1487 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.6.1 >Reporter: Paweł Lipiński >Priority: Critical > Attachments: alpha.arr, beta.arr > > > Hi, > I spotted a pesky bug in KolmogorovSmirnovTest class, in the method > kolmogorovSmirnovTest. > In order to reproduce the error use arrays from attachments. > Stacktrace: > {noformat} > org.apache.commons.math3.exception.MathInternalError: illegal state: internal > error, please fill a bug report at https://issues.apache.org/jira/browse/MATH > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.fixTies(KolmogorovSmirnovTest.java:1171) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:263) > at > org.apache.commons.math3.stat.inference.KolmogorovSmirnovTest.kolmogorovSmirnovTest(KolmogorovSmirnovTest.java:290) > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMAGING-247) crash on reading tiff image
[ https://issues.apache.org/jira/browse/IMAGING-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017866#comment-17017866 ] Robin Morier commented on IMAGING-247: -- Thanks for investigating. As a temporary workaround I'm converting the TIFFs to white_is_zero scheme (so, without the palette). I've spent some time trying to understand the library code that leads to that faulty 255 sample value but couldn't quite get my mind around the meaning of the bit shifts etc... > crash on reading tiff image > --- > > Key: IMAGING-247 > URL: https://issues.apache.org/jira/browse/IMAGING-247 > Project: Commons Imaging > Issue Type: Bug > Components: Format: TIFF >Affects Versions: 1.0-alpha1 >Reporter: Robin Morier >Priority: Major > Attachments: neutre.TIFF > > > I get an index out of bounds exception trying to load the attached image. > {noformat} > java.lang.ArrayIndexOutOfBoundsException: Index 255 out of bounds for length 2 > at > org.apache.commons.imaging.formats.tiff.photometricinterpreters.PhotometricInterpreterPalette.interpretPixel(PhotometricInterpreterPalette.java:53) > at > org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.interpretStrip(DataReaderStrips.java:179) > at > org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.readImageData(DataReaderStrips.java:212) > at > org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:659) > at > org.apache.commons.imaging.formats.tiff.TiffDirectory.getTiffImage(TiffDirectory.java:163) > at > org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:469) > at > org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1442) > at > org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1404){noformat} > > I'm calling getBufferedImage without any parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer
[ https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017849#comment-17017849 ] Chen Tao commented on MATH-1509: "For reference only" means I will recreate a pull request after discuss, and I familiar with the work flow about this project. > Implement the MiniBatchKMeansClusterer > -- > > Key: MATH-1509 > URL: https://issues.apache.org/jira/browse/MATH-1509 > Project: Commons Math > Issue Type: New Feature >Reporter: Chen Tao >Priority: Major > Attachments: compare.png > > > MiniBatchKMeans is a fast clustering algorithm, > which use partial points in initialize cluster centers, and mini batch in > training iterations. > It can finish in few seconds on clustering millions of data, and has few > differences between KMeans. > I have implemented it by Kotlin in my own project, and I'd like to contribute > the code to Apache Commons Math, of course in java. > My implemention is base on Apache Commons Math3, refer to Python > sklearn.cluster.MiniBatchKMeans > Thought test I found it works well on intensive data, significant performance > improvement and return value has few difference to KMeans++, but has many > difference on sparse data. > > Below is the comparation of my implemention and KMeansPlusPlusClusterer > !compare.png! > > I have created a pull request on > [https://github.com/apache/commons-math/pull/117], for reference only. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1509) Implement the MiniBatchKMeansClusterer
[ https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017828#comment-17017828 ] Gilles Sadowski commented on MATH-1509: --- bq. I'd like to contribute the code to Apache Commons Math Thanks, and welcome. bq. I have created a pull request [...] for reference only. What do you mean by "for reference only"? > Implement the MiniBatchKMeansClusterer > -- > > Key: MATH-1509 > URL: https://issues.apache.org/jira/browse/MATH-1509 > Project: Commons Math > Issue Type: New Feature >Reporter: Chen Tao >Priority: Major > Attachments: compare.png > > > MiniBatchKMeans is a fast clustering algorithm, > which use partial points in initialize cluster centers, and mini batch in > training iterations. > It can finish in few seconds on clustering millions of data, and has few > differences between KMeans. > I have implemented it by Kotlin in my own project, and I'd like to contribute > the code to Apache Commons Math, of course in java. > My implemention is base on Apache Commons Math3, refer to Python > sklearn.cluster.MiniBatchKMeans > Thought test I found it works well on intensive data, significant performance > improvement and return value has few difference to KMeans++, but has many > difference on sparse data. > > Below is the comparation of my implemention and KMeansPlusPlusClusterer > !compare.png! > > I have created a pull request on > [https://github.com/apache/commons-math/pull/117], for reference only. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CLI-302) More user-friendly error handling for missing required arguments
rkrisztian created CLI-302: -- Summary: More user-friendly error handling for missing required arguments Key: CLI-302 URL: https://issues.apache.org/jira/browse/CLI-302 Project: Commons CLI Issue Type: Bug Components: CLI-1.x Affects Versions: 1.4 Reporter: rkrisztian Currently when I specify a flag that requires an argument, but I actually don't specify that argument, I get the usage plus an exception. It would be nicer for the user if the exception did not happen: {noformat} $ myCliApp -a error: Missing argument for option: a usage: [options] Options: -a,--argumentspecify this argument Exception in thread "main" java.lang.NullPointerException: Cannot invoke method hasOption() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at groovy.cli.commons.CliBuilder.processSetAnnotation(CliBuilder.groovy:561) {noformat} And I cannot control this because I just call: {code:none} cli.parseFromInstance options, args {code} Thanks in advance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats
[ https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1701#comment-1701 ] Peter Alfred Lee edited comment on COMPRESS-501 at 1/17/20 8:26 AM: For this patch, the whole Central Directory is read within a single file read, and no doubt it could save some time. Currently, reading a Central Directory Header needs 4 file reads : (1) reading the size-fixed part of Central Directory Header; (2) reading the file name with variable size; (3) reading the central directory extra data with variable size; (4) reading the comment with variable size; (I think we can at least combine the 2, 3 and 4 into a single file read.) This means that we need to have N * 4 file reads when opening a zip archive with N central directories. With your patch, we can do all these within a single read. I think this is why you can make it from ~12s to ~2s(N * 4 file reads -> 1 file read). I think this is a trade off between memory and time. But we should care about the use of memory. By reading all the Central Directory into a buffer, this may take a lot of memory space. A Central Directory Header could have 46(size-fixed part) + 65536(file name) + 65536(extra data) + 65536(comment) = 196,654 bytes = ~192 kb. Basing on the zip specification, the size of the central directory could be 4,294,967,295 bytes = 4Gb (0x in zip64) at most. If a potential attacker is planning a DNS attack to this, it might not be a hard case - just make a zip with many large Central Directory Headers. So I'm wondering if we need to set a threshold value for this? Using a buffer with proper size, we can read as more Central Directory Headers as possible, and don't take too much use of the memory. was (Author: peter alfred lee): For this patch, the whole Central Directory is read within a single file read, and no doubt it could save some time. Currently, reading a Central Directory Header needs 4 file reads : (1) reading the size-fixed part of Central Directory Header; (2) reading the file name with variable size; (3) reading the central directory extra data with variable size; (4) reading the comment with variable size; (I think we can at least combine the 2, 3 and 4 into a single file read.) This means that we need to have N * 4 file reads when opening a zip archive with N central directories. With your patch, we can do all these within a single read. I think this is why you can make it from ~12s to ~2s(N * 4 file reads -> 1 file read). I think this is a trade off between memory and time. But we should care about the use of memory. By reading all the Central Directory into a buffer, this may take a lot of memory space. A Central Directory Header could have 46(size-fixed part) + 65536(file name) + 65536(extra data) + 65536(comment) = 196,654 bytes = ~192 kb. Basing on the zip specification, the size of the central directory could be 4,294,967,295 bytes = 4Gb (0x in zip64) at most. If a potential attacker is planning a DNS attack to Apache Commons-Compress, it might not be a hard case - just make a zip with many large Central Directory Headers. So I'm wondering if we need to set a threshold value for this? Using a buffer with proper size, we can read as more Central Directory Headers as possible, and don't take too much use of the memory. > Possibility to introduce a fast Zip open with some caveats > -- > > Key: COMPRESS-501 > URL: https://issues.apache.org/jira/browse/COMPRESS-501 > Project: Commons Compress > Issue Type: Improvement > Components: Archivers >Affects Versions: 1.19 > Environment: OSX 10.14.6 and Linux >Reporter: Jakob Sultan Ericsson >Priority: Major > Attachments: zipfile-speed-improvements.diff > > > About a year ago I created an improvement > (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things > in commons-compress for Zip-files. This helped us quite a lot but we wanted > it to be even faster so I optimised away some stuff that I thought was not > that important for us. > I was able to improve opening of a 34GB zip file from ~12s to ~2s. > Now to my question, do you think it would be possible to introduce some of my > fixes (diff included) into master? > Yes, I know that I shortcut some features for some specific zip files and > don't expose everything anymore. > I haven't really made a good switchable solution for it because we just use > our own build locally with this path. > But with some hints from you I might be able to do it somehow. I'm happy to > help and would love to get this speed open into master (it is always > cumbersome with custom changes to public libraries). > {code:java} > diff --git >