[jira] [Updated] (IO-650) Improve IOUtils performance by increasing DEFAULT_BUFFER_SIZE
[ https://issues.apache.org/jira/browse/IO-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brett Lounsbury updated IO-650: --- Description: IOUtils has a 4096B default buffer size that is used by copy() methods (and will be used for contentEquals methods when IO-649 is pulled). This number should be updated to 8192 for a few reasons: # It has a big improvement in performance in my micro-benchmark. I tested both copy() and contentEquals() with 4K and 8K buffers. This was done on a Late 2019 Macbook Pro (2.8GHz i7) with a 128MB file loaded into the OS buffer cache. See below for the test harness used. Past 8K performance does improve but it begins to experience some diminishing returns and could lead to excessive memory allocation. # It mirrors the default buffer size of java.io.Buffered* classes. This makes buffer sizing consistent regardless of if it is being done internally in the method or externally via a Buffered*. These classes are used internally in IOUtils as well so the buffer size is not unreasonable. For copy(): |*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance Improvement*| |*AVG*|44.2853417|64.2600679|0.31084197| |*P50*|42.692406|62.371984|0.31551951| |*P90*|49.5538826|68.4303876|0.27584975| |*P99*|62.8831473|89.759114|0.29942326| |*P100*|102.563615|177.143364|0.42101351| For contentEquals() with IO-649: |*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance Improvement*| |*AVG*|81.5009567|128.497828|0.36574059| |*P50*|78.517749|124.191476|0.36776861| |*P90*|89.9172708|136.779763|0.34261276| |*P99*|125.814333|183.881989|0.31578762| |*P100*|308.936585|559.611217|0.44794426| ``` {color:#80}public static void {color}main(String[] args) {color:#80}throws {color}Exception { NullOutputStream nos = NullOutputStream.{color:#660e7a}NULL_OUTPUT_STREAM{color}; {color:#80}for {color}({color:#80}int {color}i = {color:#ff}0{color}; i < {color:#ff}1000{color}; i++) { InputStream fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); {color:#80}long {color}start = System.nanoTime(); IOUtils.copy(fis, nos); {color:#80}long {color}defaultCopyTime = System.nanoTime() - start; fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); start = System.nanoTime(); IOUtils.copy(fis, nos, {color:#ff}8192{color}); {color:#80}long {color}bufferSizeSpecifiedCopyTime = System.nanoTime() - start; System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedCopyTime + {color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ defaultCopyTime); } }``` ``` {color:#80}public static void {color}main(String[] args) {color:#80}throws {color}Exception { {color:#80} for {color}({color:#80}int {color}i = {color:#ff}0{color}; i < {color:#ff}1000{color}; i++) { InputStream fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); InputStream fis2 = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); {color:#80}long {color}start = System.nanoTime(); IOUtils.contentEquals(fis, fis2) {color:#80}long {color}defaultContentEqualsTime = System.nanoTime() - start; fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); fis2 = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); start = System.nanoTime(); IOUtils.contentEquals(fis, fis2, {color:#ff}8192{color}); {color:#80}long {color}bufferSizeSpecifiedContentEqualsTime = System.nanoTime() - start; System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedContentEqualsTime + {color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ defaultContentEqualsTime); } }``` was: IOUtils has a 4096B default buffer size that is used by copy() methods (and will be used for contentEquals methods when IO-649 is pulled). This number should be updated to 8192 for a few reasons: # It has a big improvement in performance in my micro-benchmark. I tested both copy() and contentEquals() with 4K and 8K buffers. This was done on a Late 2019 Macbook Pro (2.8GHz i7) with a 128MB file loaded into the OS buffer cache. See below for the test harness used. Past 8K performance does improve but it begins to experience some diminishing returns and could lead to excessive memory allocation. # It mirrors the default buffer size of
[jira] [Created] (IO-650) Improve IOUtils performance by increasing DEFAULT_BUFFER_SIZE
Brett Lounsbury created IO-650: -- Summary: Improve IOUtils performance by increasing DEFAULT_BUFFER_SIZE Key: IO-650 URL: https://issues.apache.org/jira/browse/IO-650 Project: Commons IO Issue Type: Improvement Affects Versions: 1.0 Reporter: Brett Lounsbury Fix For: 2.6 IOUtils has a 4096B default buffer size that is used by copy() methods (and will be used for contentEquals methods when IO-649 is pulled). This number should be updated to 8192 for a few reasons: # It has a big improvement in performance in my micro-benchmark. I tested both copy() and contentEquals() with 4K and 8K buffers. This was done on a Late 2019 Macbook Pro (2.8GHz i7) with a 128MB file loaded into the OS buffer cache. See below for the test harness used. Past 8K performance does improve but it begins to experience some diminishing returns and could lead to excessive memory allocation. # It mirrors the default buffer size of java.io.Buffered* classes. These classes are used internally in IOUtils as well so the buffer size is not unreasonable. For copy(): |*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance Improvement*| |*AVG*|44.2853417|64.2600679|0.31084197| |*P50*|42.692406|62.371984|0.31551951| |*P90*|49.5538826|68.4303876|0.27584975| |*P99*|62.8831473|89.759114|0.29942326| |*P100*|102.563615|177.143364|0.42101351| For contentEquals() with IO-649: |*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance Improvement*| |*AVG*|81.5009567|128.497828|0.36574059| |*P50*|78.517749|124.191476|0.36776861| |*P90*|89.9172708|136.779763|0.34261276| |*P99*|125.814333|183.881989|0.31578762| |*P100*|308.936585|559.611217|0.44794426| ``` {color:#80}public static void {color}main(String[] args) {color:#80}throws {color}Exception { NullOutputStream nos = NullOutputStream.{color:#660e7a}NULL_OUTPUT_STREAM{color}; {color:#80}for {color}({color:#80}int {color}i = {color:#ff}0{color}; i < {color:#ff}1000{color}; i++) { InputStream fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); {color:#80}long {color}start = System.nanoTime(); IOUtils.copy(fis, nos); {color:#80}long {color}defaultCopyTime = System.nanoTime() - start; fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); start = System.nanoTime(); IOUtils.copy(fis, nos, {color:#ff}8192{color}); {color:#80}long {color}bufferSizeSpecifiedCopyTime = System.nanoTime() - start; System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedCopyTime + {color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ defaultCopyTime); } }``` ``` {color:#80}public static void {color}main(String[] args) {color:#80}throws {color}Exception { {color:#80} for {color}({color:#80}int {color}i = {color:#ff}0{color}; i < {color:#ff}1000{color}; i++) { InputStream fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); InputStream fis2 = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); {color:#80}long {color}start = System.nanoTime(); IOUtils.contentEquals(fis, fis2) {color:#80}long {color}defaultContentEqualsTime = System.nanoTime() - start; fis = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); fis2 = {color:#80}new {color}FileInputStream({color:#80}new {color}File({color:#008000}"/tmp/random_data"{color})); start = System.nanoTime(); IOUtils.contentEquals(fis, fis2, {color:#ff}8192{color}); {color:#80}long {color}bufferSizeSpecifiedContentEqualsTime = System.nanoTime() - start; System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedContentEqualsTime + {color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ defaultContentEqualsTime); } }``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371082 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 20:31 Start Date: 13/Jan/20 20:31 Worklog Time Spent: 10m Work Description: coveralls commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811 [![Coverage Status](https://coveralls.io/builds/28078852/badge)](https://coveralls.io/builds/28078852) Coverage increased (+0.08%) to 89.552% when pulling **836245f1a76094ee9d57d6549953c0606139f532 on brettlounsbury:master** into **11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371082) Time Spent: 2h 10m (was: 2h) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-io] coveralls edited a comment on issue #101: IO-649 - Improve the performance of the contentEquals() methods.
coveralls edited a comment on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811 [![Coverage Status](https://coveralls.io/builds/28078852/badge)](https://coveralls.io/builds/28078852) Coverage increased (+0.08%) to 89.552% when pulling **836245f1a76094ee9d57d6549953c0606139f532 on brettlounsbury:master** into **11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371060 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 20:08 Start Date: 13/Jan/20 20:08 Worklog Time Spent: 10m Work Description: brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573847055 I took a look at the coverage drop. I forgot to add tests that exercise the bounds checking of the bufferSize inputs to both contentEquals method. I've pushed a new version with that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371060) Time Spent: 2h (was: 1h 50m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 2h > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-io] brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573847055 I took a look at the coverage drop. I forgot to add tests that exercise the bounds checking of the bufferSize inputs to both contentEquals method. I've pushed a new version with that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371055 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:54 Start Date: 13/Jan/20 19:54 Worklog Time Spent: 10m Work Description: brettlounsbury commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I looked at the beginning of each method again. Both contentEquals methods start with the following statements. This can never throw a NullPointerException. If both streams are null they will be equal and therefore always return true. If one stream is null and the other is not it will always return false based on the XOR logic. Only if both streams are non-null and not the same object will the actual logic of the method execute. ``` if (input1 == input2) { return true; } if (input1 == null ^ input2 == null) { return false; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371055) Time Spent: 1h 40m (was: 1.5h) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira
[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I looked at the beginning of each method again. Both contentEquals methods start with the following statements. This can never throw a NullPointerException. If both streams are null they will be equal and therefore always return true. If one stream is null and the other is not it will always return false based on the XOR logic. Only if both streams are non-null and not the same object will the actual logic of the method execute. I pushed a new version without the `@throws NullPointerException` in the javadoc. ``` if (input1 == input2) { return true; } if (input1 == null ^ input2 == null) { return false; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371056 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:54 Start Date: 13/Jan/20 19:54 Worklog Time Spent: 10m Work Description: brettlounsbury commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I looked at the beginning of each method again. Both contentEquals methods start with the following statements. This can never throw a NullPointerException. If both streams are null they will be equal and therefore always return true. If one stream is null and the other is not it will always return false based on the XOR logic. Only if both streams are non-null and not the same object will the actual logic of the method execute. I pushed a new version without the `@throws NullPointerException` in the javadoc. ``` if (input1 == input2) { return true; } if (input1 == null ^ input2 == null) { return false; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371056) Time Spent: 1h 50m (was: 1h 40m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176
[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I looked at the beginning of each method again. Both contentEquals methods start with the following statements. This can never throw a NullPointerException. If both streams are null they will be equal and therefore always return true. If one stream is null and the other is not it will always return false based on the XOR logic. Only if both streams are non-null and not the same object will the actual logic of the method execute. ``` if (input1 == input2) { return true; } if (input1 == null ^ input2 == null) { return false; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371047 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:46 Start Date: 13/Jan/20 19:46 Worklog Time Spent: 10m Work Description: michael-o commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365994005 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I would rather see an explicit NPE up front. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371047) Time Spent: 1.5h (was: 1h 20m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365994005 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I would rather see an explicit NPE up front. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371046 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:45 Start Date: 13/Jan/20 19:45 Worklog Time Spent: 10m Work Description: brettlounsbury commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365993507 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null + * @throws IOException if an I/O error occurs + */ +@SuppressWarnings("resource") +public static boolean contentEquals(final InputStream input1, final InputStream input2, final int bufferSize) +throws IOException { +if (bufferSize <= 0) { +throw new IllegalArgumentException("Buffer size must be positive: " + bufferSize); Review comment: Fixed and updated. Done for both methods that throw an IllegalArgumentException. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371046) Time Spent: 1h 20m (was: 1h 10m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365993507 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null + * @throws IOException if an I/O error occurs + */ +@SuppressWarnings("resource") +public static boolean contentEquals(final InputStream input1, final InputStream input2, final int bufferSize) +throws IOException { +if (bufferSize <= 0) { +throw new IllegalArgumentException("Buffer size must be positive: " + bufferSize); Review comment: Fixed and updated. Done for both methods that throw an IllegalArgumentException. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371038 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:41 Start Date: 13/Jan/20 19:41 Worklog Time Spent: 10m Work Description: brettlounsbury commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365991723 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: This documentation is a clone/mutate from the original method's javadoc (the one without the bufferSize). This doesn't look to be explicitly thrown, but if either InputStream is null it throws due to the stream being dereferenced. Check the javadoc for the `public static boolean contentEquals(final InputStream input1, final InputStream input2)` method. Happy to either (a) explicitly throw a NullPointerException up front, (b) remove the comment from both places, or (c) do nothing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371038) Time Spent: 1h (was: 50m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 1h > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371040 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:41 Start Date: 13/Jan/20 19:41 Worklog Time Spent: 10m Work Description: brettlounsbury commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365991850 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream input1, final InputStream @SuppressWarnings("resource") public static boolean contentEquals(final Reader input1, final Reader input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Readers to determine if they are equal or not. + * + * This method buffers the input internally. + * + * + * @param input1 the first reader + * @param input2 the second reader + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the readers are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: Same answer as above :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371040) Time Spent: 1h 10m (was: 1h) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365991850 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream input1, final InputStream @SuppressWarnings("resource") public static boolean contentEquals(final Reader input1, final Reader input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Readers to determine if they are equal or not. + * + * This method buffers the input internally. + * + * + * @param input1 the first reader + * @param input2 the second reader + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the readers are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: Same answer as above :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365991723 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: This documentation is a clone/mutate from the original method's javadoc (the one without the bufferSize). This doesn't look to be explicitly thrown, but if either InputStream is null it throws due to the stream being dereferenced. Check the javadoc for the `public static boolean contentEquals(final InputStream input1, final InputStream input2)` method. Happy to either (a) explicitly throw a NullPointerException up front, (b) remove the comment from both places, or (c) do nothing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-io] coveralls commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods.
coveralls commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811 [![Coverage Status](https://coveralls.io/builds/28077605/badge)](https://coveralls.io/builds/28077605) Coverage decreased (-0.002%) to 89.471% when pulling **6ef0e1cbf741d745bc925e303863cd3902992b28 on brettlounsbury:master** into **11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371023=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371023 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:32 Start Date: 13/Jan/20 19:32 Worklog Time Spent: 10m Work Description: coveralls commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811 [![Coverage Status](https://coveralls.io/builds/28077605/badge)](https://coveralls.io/builds/28077605) Coverage decreased (-0.002%) to 89.471% when pulling **6ef0e1cbf741d745bc925e303863cd3902992b28 on brettlounsbury:master** into **11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371023) Time Spent: 50m (was: 40m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 50m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371020 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:30 Start Date: 13/Jan/20 19:30 Worklog Time Spent: 10m Work Description: michael-o commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365985566 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream input1, final InputStream @SuppressWarnings("resource") public static boolean contentEquals(final Reader input1, final Reader input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Readers to determine if they are equal or not. + * + * This method buffers the input internally. + * + * + * @param input1 the first reader + * @param input2 the second reader + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the readers are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: Same here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371020) Time Spent: 40m (was: 0.5h) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 40m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371019 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:30 Start Date: 13/Jan/20 19:30 Worklog Time Spent: 10m Work Description: michael-o commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365984225 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null + * @throws IOException if an I/O error occurs + */ +@SuppressWarnings("resource") +public static boolean contentEquals(final InputStream input1, final InputStream input2, final int bufferSize) +throws IOException { +if (bufferSize <= 0) { +throw new IllegalArgumentException("Buffer size must be positive: " + bufferSize); Review comment: This one is not documented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371019) Time Spent: 0.5h (was: 20m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 0.5h > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371018 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 19:30 Start Date: 13/Jan/20 19:30 Worklog Time Spent: 10m Work Description: michael-o commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365984302 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I fail to see where this one is thrown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 371018) Time Spent: 0.5h (was: 20m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 0.5h > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365984225 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null + * @throws IOException if an I/O error occurs + */ +@SuppressWarnings("resource") +public static boolean contentEquals(final InputStream input1, final InputStream input2, final int bufferSize) +throws IOException { +if (bufferSize <= 0) { +throw new IllegalArgumentException("Buffer size must be positive: " + bufferSize); Review comment: This one is not documented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365984302 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) { @SuppressWarnings("resource") public static boolean contentEquals(final InputStream input1, final InputStream input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Streams to determine if they are equal or not. + * + * This method buffers the input internally. + * + * @param input1 the first stream + * @param input2 the second stream + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the streams are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: I fail to see where this one is thrown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#discussion_r365985566 ## File path: src/main/java/org/apache/commons/io/IOUtils.java ## @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream input1, final InputStream @SuppressWarnings("resource") public static boolean contentEquals(final Reader input1, final Reader input2) throws IOException { +return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE); +} + +/** + * Compares the contents of two Readers to determine if they are equal or not. + * + * This method buffers the input internally. + * + * + * @param input1 the first reader + * @param input2 the second reader + * @param bufferSize the size of the internal buffer to use. + * @return true if the content of the readers are equal or they both don't + * exist, false otherwise + * @throws NullPointerException if either input is null Review comment: Same here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=370950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-370950 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 17:49 Start Date: 13/Jan/20 17:49 Worklog Time Spent: 10m Work Description: brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573786289 https://issues.apache.org/jira/browse/IO-649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 370950) Time Spent: 20m (was: 10m) > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 20m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements
[ https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=370948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-370948 ] ASF GitHub Bot logged work on IO-649: - Author: ASF GitHub Bot Created on: 13/Jan/20 17:48 Start Date: 13/Jan/20 17:48 Worklog Time Spent: 10m Work Description: brettlounsbury commented on pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101 This change modifies the contentEquals() methods to internally buffer content into a byte/char array and to then do batch comparisons of those arrays using Arrays.equals instead of using a BufferedInputStream or BufferedReader and making use of the single byte/char read() methods. This reduces the number of method invocations by a factor equal to the buffer size and avoids casting every byte read to an int and improves performance significantly. The following table shows the performance increase over 1000 iterations of comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 7236 to 858ms (8.43x speedup) P50: 7224 to 856ms (8.44x speedup) P90: 7249 to 860ms (8.43x speedup) P99: 7410 to 913ms (8.12x speedup) P100: 8330 to 1278ms (6.52x speedup) The following table shows the performance increase over 1000 iterations of comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 11281 to 1737ms (6.50x speedup) P50: 11262 to 1735ms (6.49x speedup) P90: 11292 to 1741ms (6.49x speedup) P99: 11707 to 1774ms (6.60x speedup) P100: 12176 to 1884ms (6.46x speedup) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 370948) Remaining Estimate: 0h Time Spent: 10m > IOUtils contentEquals method performance improvements > - > > Key: IO-649 > URL: https://issues.apache.org/jira/browse/IO-649 > Project: Commons IO > Issue Type: Improvement > Components: Utilities >Affects Versions: 1.0, 1.1 >Reporter: Brett Lounsbury >Priority: Major > Fix For: 2.6 > > Time Spent: 10m > Remaining Estimate: 0h > > > contentEquals() internally wraps any given InputStream/Reader in a Buffered > version (if it is not already buffered) which avoids a lot of IO penalties, > but then it proceeds to read each byte/character one at a time. This leads > to significantly more method calls and also a lot of byte -> int casting > since the read() method returns an int between 0 and 255 instead of returning > a byte. > > I have a change that modifies the contentEquals() methods to internally > buffer content into a byte/char array and to then do batch comparisons of > those arrays using Arrays.equals instead of using a BufferedInputStream or > BufferedReader and making use of the single byte/char read() methods. This > reduces the number of method invocations by a factor equal to the buffer size > and avoids casting every byte read to an int. > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 7236 to 858ms (8.43x speedup) > P50: 7224 to 856ms (8.44x speedup) > P90: 7249 to 860ms (8.43x speedup) > P99: 7410 to 913ms (8.12x speedup) > P100: 8330 to 1278ms (6.52x speedup) > > The following table shows the performance increase over 1000 iterations of > comparing 2 1GB Reader of character data (stored in memory to avoid I/O). > This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there > was a forced System.gc() between each iteration to avoid GC as a source of > latency: > Average: 11281 to 1737ms (6.50x speedup) > P50: 11262 to 1735ms (6.49x speedup) > P90: 11292 to 1741ms (6.49x speedup) > P99: 11707 to 1774ms (6.60x speedup) > P100: 12176 to 1884ms (6.46x speedup) > > > -- This message was sent
[GitHub] [commons-io] brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101#issuecomment-573786289 https://issues.apache.org/jira/browse/IO-649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-io] brettlounsbury opened a new pull request #101: IO-649 - Improve the performance of the contentEquals() methods.
brettlounsbury opened a new pull request #101: IO-649 - Improve the performance of the contentEquals() methods. URL: https://github.com/apache/commons-io/pull/101 This change modifies the contentEquals() methods to internally buffer content into a byte/char array and to then do batch comparisons of those arrays using Arrays.equals instead of using a BufferedInputStream or BufferedReader and making use of the single byte/char read() methods. This reduces the number of method invocations by a factor equal to the buffer size and avoids casting every byte read to an int and improves performance significantly. The following table shows the performance increase over 1000 iterations of comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 7236 to 858ms (8.43x speedup) P50: 7224 to 856ms (8.44x speedup) P90: 7249 to 860ms (8.43x speedup) P99: 7410 to 913ms (8.12x speedup) P100: 8330 to 1278ms (6.52x speedup) The following table shows the performance increase over 1000 iterations of comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 11281 to 1737ms (6.50x speedup) P50: 11262 to 1735ms (6.49x speedup) P90: 11292 to 1741ms (6.49x speedup) P99: 11707 to 1774ms (6.60x speedup) P100: 12176 to 1884ms (6.46x speedup) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (IO-649) IOUtils contentEquals method performance improvements
Brett Lounsbury created IO-649: -- Summary: IOUtils contentEquals method performance improvements Key: IO-649 URL: https://issues.apache.org/jira/browse/IO-649 Project: Commons IO Issue Type: Improvement Components: Utilities Affects Versions: 1.1, 1.0 Reporter: Brett Lounsbury Fix For: 2.6 contentEquals() internally wraps any given InputStream/Reader in a Buffered version (if it is not already buffered) which avoids a lot of IO penalties, but then it proceeds to read each byte/character one at a time. This leads to significantly more method calls and also a lot of byte -> int casting since the read() method returns an int between 0 and 255 instead of returning a byte. I have a change that modifies the contentEquals() methods to internally buffer content into a byte/char array and to then do batch comparisons of those arrays using Arrays.equals instead of using a BufferedInputStream or BufferedReader and making use of the single byte/char read() methods. This reduces the number of method invocations by a factor equal to the buffer size and avoids casting every byte read to an int. The following table shows the performance increase over 1000 iterations of comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 7236 to 858ms (8.43x speedup) P50: 7224 to 856ms (8.44x speedup) P90: 7249 to 860ms (8.43x speedup) P99: 7410 to 913ms (8.12x speedup) P100: 8330 to 1278ms (6.52x speedup) The following table shows the performance increase over 1000 iterations of comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 11281 to 1737ms (6.50x speedup) P50: 11262 to 1735ms (6.49x speedup) P90: 11292 to 1741ms (6.49x speedup) P99: 11707 to 1774ms (6.60x speedup) P100: 12176 to 1884ms (6.46x speedup) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMAGING-247) crash on reading tiff image
Robin Morier created IMAGING-247: Summary: crash on reading tiff image Key: IMAGING-247 URL: https://issues.apache.org/jira/browse/IMAGING-247 Project: Commons Imaging Issue Type: Bug Components: Format: TIFF Affects Versions: 1.0-alpha1 Reporter: Robin Morier Attachments: neutre.TIFF I get an index out of bounds exception trying to load the attached image. {noformat} java.lang.ArrayIndexOutOfBoundsException: Index 255 out of bounds for length 2 at org.apache.commons.imaging.formats.tiff.photometricinterpreters.PhotometricInterpreterPalette.interpretPixel(PhotometricInterpreterPalette.java:53) at org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.interpretStrip(DataReaderStrips.java:179) at org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.readImageData(DataReaderStrips.java:212) at org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:659) at org.apache.commons.imaging.formats.tiff.TiffDirectory.getTiffImage(TiffDirectory.java:163) at org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:469) at org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1442) at org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1404){noformat} I'm calling getBufferedImage without any parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DAEMON-415) Log is not written to stdout and stderr Logfiles
Alexander Behrend created DAEMON-415: Summary: Log is not written to stdout and stderr Logfiles Key: DAEMON-415 URL: https://issues.apache.org/jira/browse/DAEMON-415 Project: Commons Daemon Issue Type: Bug Components: Procrun Affects Versions: 1.2.2 Environment: Windows Server 2016 Standard 64 Bit - OS x64 processor Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70 GHz Reporter: Alexander Behrend Fix For: 1.2.3 I updated my Procrun Version from 1.0.15 to 1.2.2. After Installing and starting the Service, logfiles where created at the desired location but stayed empty during run. Loglevel is set to 'debug' like in the version before. I rechecked the configuration documentation, but did not find any changeable issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEOMETRY-71) Investigate Spherical Barycenter Accuracy
[ https://issues.apache.org/jira/browse/GEOMETRY-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014428#comment-17014428 ] Baljit Singh commented on GEOMETRY-71: -- [~mattjuntunen] FYI, I tried the the case of polygon with holes (using apache-math3). The algorithms still holds! > Investigate Spherical Barycenter Accuracy > - > > Key: GEOMETRY-71 > URL: https://issues.apache.org/jira/browse/GEOMETRY-71 > Project: Apache Commons Geometry > Issue Type: Bug >Reporter: Matt Juntunen >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The current code for computing spherical barycenters in > {{ConvexArea2S.getBarycenter()}} seems to suffer from floating point accuracy > issues. The {{ConvexArea2STest.checkBarycenterConsistency()}} method checks > the consistency of the barycenter computation of a region by splitting the > region into two sections, computing the area and barycenter of each section, > and then computing the combined barycenter of the sections by adding the > barycenter of each scaled by its corresponding area. It is expected that the > combined barycenter computed in this way should equal the barycenter computed > for the region as a whole. However, in practice, a large epsilon value is > needed in the comparison in order for the tests to pass. We need to > investigate why this is the case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMAGING-246) Invalid Block Size error prevents handling of block 1084, Macintosh NSPrintInfo
[ https://issues.apache.org/jira/browse/IMAGING-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014377#comment-17014377 ] Liberty Wollerman commented on IMAGING-246: --- Thank you [~kinow] I appreciate your speedy response. Also, thanks for reformatting the description. I noticed that the reformatted code block was different content than my originally pasted code, so I just updated that in the description. > Invalid Block Size error prevents handling of block 1084, Macintosh > NSPrintInfo > --- > > Key: IMAGING-246 > URL: https://issues.apache.org/jira/browse/IMAGING-246 > Project: Commons Imaging > Issue Type: Bug > Components: Format: JPEG >Affects Versions: 1.0-alpha1 >Reporter: Liberty Wollerman >Assignee: Bruno P. Kinoshita >Priority: Major > Attachments: FallHarvestKitKat_07610.jpg > > Time Spent: 40m > Remaining Estimate: 0h > > When processing an image created on a Mac with Adobe Photoshop which contains > embedded metadata having block 1084, an invalid block size error occurs. > |0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific > info for Macintosh. NSPrintInfo. It is recommened that you do not interpret > or use this data.| > > Here is some simple test code that replicates what our application is trying > to do, and recreates the error: > {code:java} > import org.apache.commons.imaging.ImageInfo; > import org.apache.commons.imaging.ImageReadException; > import org.apache.commons.imaging.Imaging; > import org.apache.commons.io.FileUtils; > import java.io.File; > import java.io.IOException; > import java.util.Base64; > public class Main { > public static void main(String[] args) throws IOException, > ImageReadException { > String fileName = "FallHarvestKitKat_07610.jpg"; > ClassLoader classLoader = ClassLoader.getSystemClassLoader(); > File file = new File(classLoader.getResource(fileName).getFile()); > byte[] fileContent = FileUtils.readFileToByteArray(file); > String encodedString = Base64.getEncoder().encodeToString(fileContent); > byte[] decodedValue = Base64.getDecoder().decode(encodedString); > ImageInfo imageInfo = Imaging.getImageInfo(decodedValue); > } > }{code} > > Here is the resulting error: > {noformat} > Exception in thread "main" org.apache.commons.imaging.ImageReadException: > Invalid Block Size : 89562 > 65504Exception in thread "main" > org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > > 65504 at > > org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318) > at > > org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119) > at > > org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112) > at > > org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71) > at > > org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599) > at > > org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318) > at > > org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739) > at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at > org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:635) at > Main.getMetaData(Main.java:22) at Main.main(Main.java:17){noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IMAGING-246) Invalid Block Size error prevents handling of block 1084, Macintosh NSPrintInfo
[ https://issues.apache.org/jira/browse/IMAGING-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liberty Wollerman updated IMAGING-246: -- Description: When processing an image created on a Mac with Adobe Photoshop which contains embedded metadata having block 1084, an invalid block size error occurs. |0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use this data.| Here is some simple test code that replicates what our application is trying to do, and recreates the error: {code:java} import org.apache.commons.imaging.ImageInfo; import org.apache.commons.imaging.ImageReadException; import org.apache.commons.imaging.Imaging; import org.apache.commons.io.FileUtils; import java.io.File; import java.io.IOException; import java.util.Base64; public class Main { public static void main(String[] args) throws IOException, ImageReadException { String fileName = "FallHarvestKitKat_07610.jpg"; ClassLoader classLoader = ClassLoader.getSystemClassLoader(); File file = new File(classLoader.getResource(fileName).getFile()); byte[] fileContent = FileUtils.readFileToByteArray(file); String encodedString = Base64.getEncoder().encodeToString(fileContent); byte[] decodedValue = Base64.getDecoder().decode(encodedString); ImageInfo imageInfo = Imaging.getImageInfo(decodedValue); } }{code} Here is the resulting error: {noformat} Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504 at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112) at org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739) at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:635) at Main.getMetaData(Main.java:22) at Main.main(Main.java:17){noformat} was: When processing an image created on a Mac with Adobe Photoshop which contains embedded metadata having block 1084, an invalid block size error occurs. |0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use this data.| Here is some simple test code that replicates what our application is trying to do, and recreates the error: {code:java} import org.apache.commons.imaging.ImageInfo; import org.apache.commons.imaging.ImageReadException; import org.apache.commons.imaging.Imaging; import org.apache.commons.io.FileUtils; import java.io.File; import java.io.IOException; import java.util.Base64; public class Main { public static void main(String[] args) throws IOException, ImageReadException { String fileName = "FallHarvestKitKat_07610.jpg"; ClassLoader classLoader = ClassLoader.getSystemClassLoader(); File file = new File(classLoader.getResource(fileName).getFile()); byte[] fileContent = FileUtils.readFileToByteArray(file); String encodedString = Base64.getEncoder().encodeToString(fileContent); byte[] decodedValue = Base64.getDecoder().decode(encodedString); ImageInfo imageInfo = Imaging.getImageInfo(decodedValue); } }{code} Here is the resulting error: {noformat} Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504 at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112) at org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739) at
[jira] [Updated] (IMAGING-246) Invalid Block Size error prevents handling of block 1084, Macintosh NSPrintInfo
[ https://issues.apache.org/jira/browse/IMAGING-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liberty Wollerman updated IMAGING-246: -- Description: When processing an image created on a Mac with Adobe Photoshop which contains embedded metadata having block 1084, an invalid block size error occurs. |0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use this data.| Here is some simple test code that replicates what our application is trying to do, and recreates the error: {code:java} import org.apache.commons.imaging.ImageInfo; import org.apache.commons.imaging.ImageReadException; import org.apache.commons.imaging.Imaging; import org.apache.commons.io.FileUtils; import java.io.File; import java.io.IOException; import java.util.Base64; public class Main { public static void main(String[] args) throws IOException, ImageReadException { String fileName = "FallHarvestKitKat_07610.jpg"; ClassLoader classLoader = ClassLoader.getSystemClassLoader(); File file = new File(classLoader.getResource(fileName).getFile()); byte[] fileContent = FileUtils.readFileToByteArray(file); String encodedString = Base64.getEncoder().encodeToString(fileContent); byte[] decodedValue = Base64.getDecoder().decode(encodedString); ImageInfo imageInfo = Imaging.getImageInfo(decodedValue); } }{code} Here is the resulting error: {noformat} Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504 at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112) at org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739) at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:635) at Main.getMetaData(Main.java:22) at Main.main(Main.java:17){noformat} was: When processing an image created on a Mac with Adobe Photoshop which contains embedded metadata having block 1084, an invalid block size error occurs. |0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use this data.| Here is some simple test code that replicates what our application is trying to do, and recreates the error: {code:java} import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { public static void main(String args[]) {// String to be scanned to find the pattern. String line = "This order was placed for QT3000! OK?"; String pattern = "(.*)(\\d+)(.*)";// Create a Pattern object Pattern r = Pattern.compile(pattern);// Now create matcher object. Matcher m = r.matcher(line); if (m.find()) { System.out.println("Found value: " + m.group(0)); System.out.println("Found value: " + m.group(1)); System.out.println("Found value: " + m.group(2)); } else { System.out.println("NO MATCH"); } } }{code} Here is the resulting error: {noformat} Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504Exception in thread "main" org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 65504 at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119) at org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112) at org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318) at org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739) at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at
[GitHub] [commons-csv] kensixx commented on issue #43: CSVFormat#valiadte() does not account for allowDuplicateHeaderNames
kensixx commented on issue #43: CSVFormat#valiadte() does not account for allowDuplicateHeaderNames URL: https://github.com/apache/commons-csv/pull/43#issuecomment-573559810 Hi, and good day @garydgregory , any updates on ver 1.8 release? =) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (JEXL-318) Annotation processing may fail in lexical mode
Dmitri Blinov created JEXL-318: -- Summary: Annotation processing may fail in lexical mode Key: JEXL-318 URL: https://issues.apache.org/jira/browse/JEXL-318 Project: Commons JEXL Issue Type: Bug Affects Versions: 3.1 Reporter: Dmitri Blinov I fave found that the annotation processing under certain conditions may lead to NPE {code:java} public static class OptAnnotationContext extends JexlEvalContext implements JexlContext.AnnotationProcessor { @Override public Object processAnnotation(String name, Object[] args, Callable statement) throws Exception { JexlOptions options = this.getEngineOptions(); // transient side effect for strict if ("scale".equals(name)) { int scale = options.getMathScale(); int newScale = (Integer) args[0]; options.setMathScale(newScale); try { return statement.call(); } finally { options.setMathScale(scale); } } return statement.call(); } } @Test public void testAnnotation() throws Exception { JexlFeatures f = new JexlFeatures(); f.lexical(true); JexlEngine jexl = new JexlBuilder().strict(true).features(f).create(); JexlScript script = jexl.createScript("@scale(13) @test var i = 42"); JexlContext jc = new OptAnnotationContext(); Object result = script.execute(jc); Assert.assertEquals(result, 42); } {code} This is because new instance of Interpeter is created to process annotation under certain conditions, and this new instance does not inherit the current lexical block. Furthermore, the constructor of InterperterBase {{InterpreterBase(InterpreterBase ii, JexlArithmetic jexla)}} now silently ignores JexlArithmetic passed to it, which is possibly another bug. As a suggestion, can we refactor the code to simply make JexlArithmetic non final in InterpreterBase? There would be no need to create new instance of Interpeter and complicate code with sync-state code? -- This message was sent by Atlassian Jira (v8.3.4#803005)