[jira] [Updated] (IO-650) Improve IOUtils performance by increasing DEFAULT_BUFFER_SIZE

2020-01-13 Thread Brett Lounsbury (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brett Lounsbury updated IO-650:
---
Description: 
IOUtils has a 4096B default buffer size that is used by copy() methods (and 
will be used for contentEquals methods when IO-649 is pulled).  This number 
should be updated to 8192 for a few reasons:
 # It has a big improvement in performance in my micro-benchmark.  I tested 
both copy() and contentEquals() with 4K and 8K buffers.  This was done on a 
Late 2019 Macbook Pro (2.8GHz i7) with a 128MB file loaded into the OS buffer 
cache.  See below for the test harness used.  Past 8K performance does improve 
but it begins to experience some diminishing returns and could lead to 
excessive memory allocation.
 # It mirrors the default buffer size of java.io.Buffered* classes.  This makes 
buffer sizing consistent regardless of if it is being done internally in the 
method or externally via a Buffered*.  These classes are used internally in 
IOUtils as well so the buffer size is not unreasonable.

 

For copy():
|*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance 
Improvement*|
|*AVG*|44.2853417|64.2600679|0.31084197|
|*P50*|42.692406|62.371984|0.31551951|
|*P90*|49.5538826|68.4303876|0.27584975|
|*P99*|62.8831473|89.759114|0.29942326|
|*P100*|102.563615|177.143364|0.42101351|

 

For contentEquals() with IO-649:
|*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance 
Improvement*|
|*AVG*|81.5009567|128.497828|0.36574059|
|*P50*|78.517749|124.191476|0.36776861|
|*P90*|89.9172708|136.779763|0.34261276|
|*P99*|125.814333|183.881989|0.31578762|
|*P100*|308.936585|559.611217|0.44794426|

 

```

{color:#80}public static void {color}main(String[] args) 
{color:#80}throws {color}Exception {
  NullOutputStream nos = 
NullOutputStream.{color:#660e7a}NULL_OUTPUT_STREAM{color};
  {color:#80}for {color}({color:#80}int {color}i = 
{color:#ff}0{color}; i < {color:#ff}1000{color}; i++) {
  InputStream fis = {color:#80}new 
{color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));
  {color:#80}long {color}start = System.nanoTime();
  IOUtils.copy(fis, nos);
  {color:#80}long {color}defaultCopyTime = System.nanoTime() - 
start;

 fis = {color:#80}new {color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));

 start = System.nanoTime();
  IOUtils.copy(fis, nos, {color:#ff}8192{color});
  {color:#80}long {color}bufferSizeSpecifiedCopyTime = 
System.nanoTime() - start;

 System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedCopyTime + 
{color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ 
defaultCopyTime);
  }
 }```

 

```

{color:#80}public static void {color}main(String[] args) 
{color:#80}throws {color}Exception {
 {color:#80}    for {color}({color:#80}int {color}i = 
{color:#ff}0{color}; i < {color:#ff}1000{color}; i++) {
     InputStream fis = {color:#80}new 
{color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));
     InputStream fis2 = {color:#80}new 
{color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));

    {color:#80}long {color}start = System.nanoTime();
     IOUtils.contentEquals(fis, fis2)
     {color:#80}long {color}defaultContentEqualsTime = 
System.nanoTime() - start;

    fis = {color:#80}new {color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));
     fis2 = {color:#80}new {color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));

 start = System.nanoTime();
  IOUtils.contentEquals(fis, fis2, {color:#ff}8192{color});
  {color:#80}long {color}bufferSizeSpecifiedContentEqualsTime = 
System.nanoTime() - start;

 
System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedContentEqualsTime + 
{color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ 
defaultContentEqualsTime);

    }

}```

  was:
IOUtils has a 4096B default buffer size that is used by copy() methods (and 
will be used for contentEquals methods when IO-649 is pulled).  This number 
should be updated to 8192 for a few reasons:
 # It has a big improvement in performance in my micro-benchmark.  I tested 
both copy() and contentEquals() with 4K and 8K buffers.  This was done on a 
Late 2019 Macbook Pro (2.8GHz i7) with a 128MB file loaded into the OS buffer 
cache.  See below for the test harness used.  Past 8K performance does improve 
but it begins to experience some diminishing returns and could lead to 
excessive memory allocation.
 # It mirrors the default buffer size of 

[jira] [Created] (IO-650) Improve IOUtils performance by increasing DEFAULT_BUFFER_SIZE

2020-01-13 Thread Brett Lounsbury (Jira)
Brett Lounsbury created IO-650:
--

 Summary: Improve IOUtils performance by increasing 
DEFAULT_BUFFER_SIZE
 Key: IO-650
 URL: https://issues.apache.org/jira/browse/IO-650
 Project: Commons IO
  Issue Type: Improvement
Affects Versions: 1.0
Reporter: Brett Lounsbury
 Fix For: 2.6


IOUtils has a 4096B default buffer size that is used by copy() methods (and 
will be used for contentEquals methods when IO-649 is pulled).  This number 
should be updated to 8192 for a few reasons:
 # It has a big improvement in performance in my micro-benchmark.  I tested 
both copy() and contentEquals() with 4K and 8K buffers.  This was done on a 
Late 2019 Macbook Pro (2.8GHz i7) with a 128MB file loaded into the OS buffer 
cache.  See below for the test harness used.  Past 8K performance does improve 
but it begins to experience some diminishing returns and could lead to 
excessive memory allocation.
 # It mirrors the default buffer size of java.io.Buffered* classes.  These 
classes are used internally in IOUtils as well so the buffer size is not 
unreasonable.

 

For copy():
|*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance 
Improvement*|
|*AVG*|44.2853417|64.2600679|0.31084197|
|*P50*|42.692406|62.371984|0.31551951|
|*P90*|49.5538826|68.4303876|0.27584975|
|*P99*|62.8831473|89.759114|0.29942326|
|*P100*|102.563615|177.143364|0.42101351|

 

For contentEquals() with IO-649:
|*Metric*|*8K Buffer Millisecond*|*4K Buffer Millisecond*|*Performance 
Improvement*|
|*AVG*|81.5009567|128.497828|0.36574059|
|*P50*|78.517749|124.191476|0.36776861|
|*P90*|89.9172708|136.779763|0.34261276|
|*P99*|125.814333|183.881989|0.31578762|
|*P100*|308.936585|559.611217|0.44794426|

 

```

{color:#80}public static void {color}main(String[] args) 
{color:#80}throws {color}Exception {
 NullOutputStream nos = 
NullOutputStream.{color:#660e7a}NULL_OUTPUT_STREAM{color};
 {color:#80}for {color}({color:#80}int {color}i = 
{color:#ff}0{color}; i < {color:#ff}1000{color}; i++) {
 InputStream fis = {color:#80}new 
{color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));
 {color:#80}long {color}start = System.nanoTime();
 IOUtils.copy(fis, nos);
 {color:#80}long {color}defaultCopyTime = System.nanoTime() - start;

 fis = {color:#80}new {color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));

 start = System.nanoTime();
 IOUtils.copy(fis, nos, {color:#ff}8192{color});
 {color:#80}long {color}bufferSizeSpecifiedCopyTime = 
System.nanoTime() - start;

 System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedCopyTime + 
{color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ 
defaultCopyTime);
 }
}```

 

```

{color:#80}public static void {color}main(String[] args) 
{color:#80}throws {color}Exception {
{color:#80}    for {color}({color:#80}int {color}i = 
{color:#ff}0{color}; i < {color:#ff}1000{color}; i++) {
    InputStream fis = {color:#80}new 
{color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));
    InputStream fis2 = {color:#80}new 
{color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));

    {color:#80}long {color}start = System.nanoTime();
    IOUtils.contentEquals(fis, fis2)
    {color:#80}long {color}defaultContentEqualsTime = System.nanoTime() 
- start;

    fis = {color:#80}new {color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));
    fis2 = {color:#80}new {color}FileInputStream({color:#80}new 
{color}File({color:#008000}"/tmp/random_data"{color}));
 
 start = System.nanoTime();
 IOUtils.contentEquals(fis, fis2, {color:#ff}8192{color});
 {color:#80}long {color}bufferSizeSpecifiedContentEqualsTime = 
System.nanoTime() - start;

 
System.{color:#660e7a}out{color}.println(bufferSizeSpecifiedContentEqualsTime + 
{color:#008000}"{color}{color:#80}\t{color}{color:#008000}" {color}+ 
defaultContentEqualsTime);

    }

}```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371082
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 20:31
Start Date: 13/Jan/20 20:31
Worklog Time Spent: 10m 
  Work Description: coveralls commented on issue #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/28078852/badge)](https://coveralls.io/builds/28078852)
   
   Coverage increased (+0.08%) to 89.552% when pulling 
**836245f1a76094ee9d57d6549953c0606139f532 on brettlounsbury:master** into 
**11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371082)
Time Spent: 2h 10m  (was: 2h)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-io] coveralls edited a comment on issue #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
coveralls edited a comment on issue #101: IO-649 - Improve the performance of 
the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/28078852/badge)](https://coveralls.io/builds/28078852)
   
   Coverage increased (+0.08%) to 89.552% when pulling 
**836245f1a76094ee9d57d6549953c0606139f532 on brettlounsbury:master** into 
**11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371060
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 20:08
Start Date: 13/Jan/20 20:08
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on issue #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573847055
 
 
   I took a look at the coverage drop.  I forgot to add tests that exercise the 
bounds checking of the bufferSize inputs to both contentEquals method.  I've 
pushed a new version with that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371060)
Time Spent: 2h  (was: 1h 50m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-io] brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury commented on issue #101: IO-649 - Improve the performance of the 
contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573847055
 
 
   I took a look at the coverage drop.  I forgot to add tests that exercise the 
bounds checking of the bufferSize inputs to both contentEquals method.  I've 
pushed a new version with that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371055
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:54
Start Date: 13/Jan/20 19:54
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I looked at the beginning of each method again.  Both contentEquals methods 
start with the following statements.  This can never throw a 
NullPointerException.  If both streams are null they will be equal and 
therefore always return true.  If one stream is null and the other is not it 
will always return false based on the XOR logic.  Only if both streams are 
non-null and not the same object will the actual logic of the method execute.
   
   ```
   if (input1 == input2) {
   return true;
   }
   if (input1 == null ^ input2 == null) {
   return false;
   }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371055)
Time Spent: 1h 40m  (was: 1.5h)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira

[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I looked at the beginning of each method again.  Both contentEquals methods 
start with the following statements.  This can never throw a 
NullPointerException.  If both streams are null they will be equal and 
therefore always return true.  If one stream is null and the other is not it 
will always return false based on the XOR logic.  Only if both streams are 
non-null and not the same object will the actual logic of the method execute.  
I pushed a new version without the `@throws NullPointerException` in the 
javadoc.
   
   ```
   if (input1 == input2) {
   return true;
   }
   if (input1 == null ^ input2 == null) {
   return false;
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371056
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:54
Start Date: 13/Jan/20 19:54
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I looked at the beginning of each method again.  Both contentEquals methods 
start with the following statements.  This can never throw a 
NullPointerException.  If both streams are null they will be equal and 
therefore always return true.  If one stream is null and the other is not it 
will always return false based on the XOR logic.  Only if both streams are 
non-null and not the same object will the actual logic of the method execute.  
I pushed a new version without the `@throws NullPointerException` in the 
javadoc.
   
   ```
   if (input1 == input2) {
   return true;
   }
   if (input1 == null ^ input2 == null) {
   return false;
   }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371056)
Time Spent: 1h 50m  (was: 1h 40m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 

[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365997580
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I looked at the beginning of each method again.  Both contentEquals methods 
start with the following statements.  This can never throw a 
NullPointerException.  If both streams are null they will be equal and 
therefore always return true.  If one stream is null and the other is not it 
will always return false based on the XOR logic.  Only if both streams are 
non-null and not the same object will the actual logic of the method execute.
   
   ```
   if (input1 == input2) {
   return true;
   }
   if (input1 == null ^ input2 == null) {
   return false;
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371047
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:46
Start Date: 13/Jan/20 19:46
Worklog Time Spent: 10m 
  Work Description: michael-o commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365994005
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I would rather see an explicit NPE up front.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371047)
Time Spent: 1.5h  (was: 1h 20m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
michael-o commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365994005
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I would rather see an explicit NPE up front.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371046
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:45
Start Date: 13/Jan/20 19:45
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365993507
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
+ * @throws IOException  if an I/O error occurs
+ */
+@SuppressWarnings("resource")
+public static boolean contentEquals(final InputStream input1, final 
InputStream input2, final int bufferSize)
+throws IOException {
+if (bufferSize <= 0) {
+throw new IllegalArgumentException("Buffer size must be positive: 
" + bufferSize);
 
 Review comment:
   Fixed and updated.  Done for both methods that throw an 
IllegalArgumentException.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371046)
Time Spent: 1h 20m  (was: 1h 10m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365993507
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
+ * @throws IOException  if an I/O error occurs
+ */
+@SuppressWarnings("resource")
+public static boolean contentEquals(final InputStream input1, final 
InputStream input2, final int bufferSize)
+throws IOException {
+if (bufferSize <= 0) {
+throw new IllegalArgumentException("Buffer size must be positive: 
" + bufferSize);
 
 Review comment:
   Fixed and updated.  Done for both methods that throw an 
IllegalArgumentException.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371038
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:41
Start Date: 13/Jan/20 19:41
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365991723
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   This documentation is a clone/mutate from the original method's javadoc (the 
one without the bufferSize).  This doesn't look to be explicitly thrown, but if 
either InputStream is null it throws due to the stream being dereferenced.
   
   Check the javadoc for the `public static boolean contentEquals(final 
InputStream input1, final InputStream input2)` method.
   
   Happy to either (a) explicitly throw a NullPointerException up front, (b) 
remove the comment from both places, or (c) do nothing.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371038)
Time Spent: 1h  (was: 50m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371040
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:41
Start Date: 13/Jan/20 19:41
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365991850
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream 
input1, final InputStream
 @SuppressWarnings("resource")
 public static boolean contentEquals(final Reader input1, final Reader 
input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Readers to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ * 
+ *
+ * @param input1 the first reader
+ * @param input2 the second reader
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the readers are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   Same answer as above :-)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371040)
Time Spent: 1h 10m  (was: 1h)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365991850
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream 
input1, final InputStream
 @SuppressWarnings("resource")
 public static boolean contentEquals(final Reader input1, final Reader 
input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Readers to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ * 
+ *
+ * @param input1 the first reader
+ * @param input2 the second reader
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the readers are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   Same answer as above :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [commons-io] brettlounsbury commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365991723
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   This documentation is a clone/mutate from the original method's javadoc (the 
one without the bufferSize).  This doesn't look to be explicitly thrown, but if 
either InputStream is null it throws due to the stream being dereferenced.
   
   Check the javadoc for the `public static boolean contentEquals(final 
InputStream input1, final InputStream input2)` method.
   
   Happy to either (a) explicitly throw a NullPointerException up front, (b) 
remove the comment from both places, or (c) do nothing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [commons-io] coveralls commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
coveralls commented on issue #101: IO-649 - Improve the performance of the 
contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/28077605/badge)](https://coveralls.io/builds/28077605)
   
   Coverage decreased (-0.002%) to 89.471% when pulling 
**6ef0e1cbf741d745bc925e303863cd3902992b28 on brettlounsbury:master** into 
**11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371023=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371023
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:32
Start Date: 13/Jan/20 19:32
Worklog Time Spent: 10m 
  Work Description: coveralls commented on issue #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573831811
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/28077605/badge)](https://coveralls.io/builds/28077605)
   
   Coverage decreased (-0.002%) to 89.471% when pulling 
**6ef0e1cbf741d745bc925e303863cd3902992b28 on brettlounsbury:master** into 
**11f0abe7a3fb6954b2985ca4ab0697b2fb489e84 on apache:master**.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371023)
Time Spent: 50m  (was: 40m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371020
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:30
Start Date: 13/Jan/20 19:30
Worklog Time Spent: 10m 
  Work Description: michael-o commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365985566
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream 
input1, final InputStream
 @SuppressWarnings("resource")
 public static boolean contentEquals(final Reader input1, final Reader 
input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Readers to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ * 
+ *
+ * @param input1 the first reader
+ * @param input2 the second reader
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the readers are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371020)
Time Spent: 40m  (was: 0.5h)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371019
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:30
Start Date: 13/Jan/20 19:30
Worklog Time Spent: 10m 
  Work Description: michael-o commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365984225
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
+ * @throws IOException  if an I/O error occurs
+ */
+@SuppressWarnings("resource")
+public static boolean contentEquals(final InputStream input1, final 
InputStream input2, final int bufferSize)
+throws IOException {
+if (bufferSize <= 0) {
+throw new IllegalArgumentException("Buffer size must be positive: 
" + bufferSize);
 
 Review comment:
   This one is not documented.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371019)
Time Spent: 0.5h  (was: 20m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371018
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 19:30
Start Date: 13/Jan/20 19:30
Worklog Time Spent: 10m 
  Work Description: michael-o commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365984302
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I fail to see where this one is thrown.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 371018)
Time Spent: 0.5h  (was: 20m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
michael-o commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365984225
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
+ * @throws IOException  if an I/O error occurs
+ */
+@SuppressWarnings("resource")
+public static boolean contentEquals(final InputStream input1, final 
InputStream input2, final int bufferSize)
+throws IOException {
+if (bufferSize <= 0) {
+throw new IllegalArgumentException("Buffer size must be positive: 
" + bufferSize);
 
 Review comment:
   This one is not documented.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
michael-o commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365984302
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -708,31 +706,79 @@ public static void closeQuietly(final Writer output) {
 @SuppressWarnings("resource")
 public static boolean contentEquals(final InputStream input1, final 
InputStream input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Streams to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ *
+ * @param input1 the first stream
+ * @param input2 the second stream
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the streams are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   I fail to see where this one is thrown.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [commons-io] michael-o commented on a change in pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
michael-o commented on a change in pull request #101: IO-649 - Improve the 
performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365985566
 
 

 ##
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##
 @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream 
input1, final InputStream
 @SuppressWarnings("resource")
 public static boolean contentEquals(final Reader input1, final Reader 
input2)
 throws IOException {
+return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+}
+
+/**
+ * Compares the contents of two Readers to determine if they are equal or 
not.
+ * 
+ * This method buffers the input internally.
+ * 
+ *
+ * @param input1 the first reader
+ * @param input2 the second reader
+ * @param bufferSize the size of the internal buffer to use.
+ * @return true if the content of the readers are equal or they both don't
+ * exist, false otherwise
+ * @throws NullPointerException if either input is null
 
 Review comment:
   Same here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=370950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-370950
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 17:49
Start Date: 13/Jan/20 17:49
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on issue #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573786289
 
 
   https://issues.apache.org/jira/browse/IO-649
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 370950)
Time Spent: 20m  (was: 10m)

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=370948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-370948
 ]

ASF GitHub Bot logged work on IO-649:
-

Author: ASF GitHub Bot
Created on: 13/Jan/20 17:48
Start Date: 13/Jan/20 17:48
Worklog Time Spent: 10m 
  Work Description: brettlounsbury commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101
 
 
   This change modifies the contentEquals() methods to internally buffer 
content into a byte/char array and to then do batch comparisons of those arrays 
using Arrays.equals instead of using a BufferedInputStream or BufferedReader 
and making use of the single byte/char read() methods.
   
   This reduces the number of method invocations by a factor equal to the 
buffer size and avoids casting every byte read to an int and improves 
performance significantly.
   
   The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O).  
This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
was a forced System.gc() between each iteration to avoid GC as a source of 
latency:
   
   Average: 7236 to 858ms (8.43x speedup)
   P50: 7224 to 856ms (8.44x speedup)
   P90: 7249 to 860ms (8.43x speedup)
   P99: 7410 to 913ms (8.12x speedup)
   P100: 8330 to 1278ms (6.52x speedup)
   
   The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB Reader of character data (stored in memory to avoid I/O).  This 
test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a 
forced System.gc() between each iteration to avoid GC as a source of latency:
   
   Average: 11281 to 1737ms (6.50x speedup)
   P50: 11262 to 1735ms (6.49x speedup)
   P90: 11292 to 1741ms (6.49x speedup)
   P99: 11707 to 1774ms (6.60x speedup)
   P100: 12176 to 1884ms (6.46x speedup)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 370948)
Remaining Estimate: 0h
Time Spent: 10m

> IOUtils contentEquals method performance improvements
> -
>
> Key: IO-649
> URL: https://issues.apache.org/jira/browse/IO-649
> Project: Commons IO
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.0, 1.1
>Reporter: Brett Lounsbury
>Priority: Major
> Fix For: 2.6
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent 

[GitHub] [commons-io] brettlounsbury commented on issue #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury commented on issue #101: IO-649 - Improve the performance of the 
contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-573786289
 
 
   https://issues.apache.org/jira/browse/IO-649


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [commons-io] brettlounsbury opened a new pull request #101: IO-649 - Improve the performance of the contentEquals() methods.

2020-01-13 Thread GitBox
brettlounsbury opened a new pull request #101: IO-649 - Improve the performance 
of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101
 
 
   This change modifies the contentEquals() methods to internally buffer 
content into a byte/char array and to then do batch comparisons of those arrays 
using Arrays.equals instead of using a BufferedInputStream or BufferedReader 
and making use of the single byte/char read() methods.
   
   This reduces the number of method invocations by a factor equal to the 
buffer size and avoids casting every byte read to an int and improves 
performance significantly.
   
   The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O).  
This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
was a forced System.gc() between each iteration to avoid GC as a source of 
latency:
   
   Average: 7236 to 858ms (8.43x speedup)
   P50: 7224 to 856ms (8.44x speedup)
   P90: 7249 to 860ms (8.43x speedup)
   P99: 7410 to 913ms (8.12x speedup)
   P100: 8330 to 1278ms (6.52x speedup)
   
   The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB Reader of character data (stored in memory to avoid I/O).  This 
test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a 
forced System.gc() between each iteration to avoid GC as a source of latency:
   
   Average: 11281 to 1737ms (6.50x speedup)
   P50: 11262 to 1735ms (6.49x speedup)
   P90: 11292 to 1741ms (6.49x speedup)
   P99: 11707 to 1774ms (6.60x speedup)
   P100: 12176 to 1884ms (6.46x speedup)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (IO-649) IOUtils contentEquals method performance improvements

2020-01-13 Thread Brett Lounsbury (Jira)
Brett Lounsbury created IO-649:
--

 Summary: IOUtils contentEquals method performance improvements
 Key: IO-649
 URL: https://issues.apache.org/jira/browse/IO-649
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 1.1, 1.0
Reporter: Brett Lounsbury
 Fix For: 2.6


 

contentEquals() internally wraps any given InputStream/Reader in a Buffered 
version (if it is not already buffered) which avoids a lot of IO penalties, but 
then it proceeds to read each byte/character one at a time.  This leads to 
significantly more method calls and also a lot of byte -> int casting since the 
read() method returns an int between 0 and 255 instead of returning a byte.

 

I have a change that modifies the contentEquals() methods to internally buffer 
content into a byte/char array and to then do batch comparisons of those arrays 
using Arrays.equals instead of using a BufferedInputStream or BufferedReader 
and making use of the single byte/char read() methods.  This reduces the number 
of method invocations by a factor equal to the buffer size and avoids casting 
every byte read to an int.

 

The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
was a forced System.gc() between each iteration to avoid GC as a source of 
latency:

Average: 7236 to 858ms (8.43x speedup)
P50: 7224 to 856ms (8.44x speedup)
P90: 7249 to 860ms (8.43x speedup)
P99: 7410 to 913ms (8.12x speedup)
P100: 8330 to 1278ms (6.52x speedup)

 

The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This 
test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a 
forced System.gc() between each iteration to avoid GC as a source of latency:

Average: 11281 to 1737ms (6.50x speedup)
P50: 11262 to 1735ms (6.49x speedup)
P90: 11292 to 1741ms (6.49x speedup)
P99: 11707 to 1774ms (6.60x speedup)
P100: 12176 to 1884ms (6.46x speedup)

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMAGING-247) crash on reading tiff image

2020-01-13 Thread Robin Morier (Jira)
Robin Morier created IMAGING-247:


 Summary: crash on reading tiff image
 Key: IMAGING-247
 URL: https://issues.apache.org/jira/browse/IMAGING-247
 Project: Commons Imaging
  Issue Type: Bug
  Components: Format: TIFF
Affects Versions: 1.0-alpha1
Reporter: Robin Morier
 Attachments: neutre.TIFF

I get an index out of bounds exception trying to load the attached image.


{noformat}
java.lang.ArrayIndexOutOfBoundsException: Index 255 out of bounds for length 2
at 
org.apache.commons.imaging.formats.tiff.photometricinterpreters.PhotometricInterpreterPalette.interpretPixel(PhotometricInterpreterPalette.java:53)
at 
org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.interpretStrip(DataReaderStrips.java:179)
at 
org.apache.commons.imaging.formats.tiff.datareaders.DataReaderStrips.readImageData(DataReaderStrips.java:212)
at 
org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:659)
at 
org.apache.commons.imaging.formats.tiff.TiffDirectory.getTiffImage(TiffDirectory.java:163)
at 
org.apache.commons.imaging.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:469)
at 
org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1442)
at 
org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1404){noformat}
 

I'm calling getBufferedImage without any parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DAEMON-415) Log is not written to stdout and stderr Logfiles

2020-01-13 Thread Alexander Behrend (Jira)
Alexander Behrend created DAEMON-415:


 Summary: Log is not written to stdout and stderr Logfiles
 Key: DAEMON-415
 URL: https://issues.apache.org/jira/browse/DAEMON-415
 Project: Commons Daemon
  Issue Type: Bug
  Components: Procrun
Affects Versions: 1.2.2
 Environment: Windows Server 2016 Standard

64 Bit - OS

x64 processor

Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70 GHz

 
Reporter: Alexander Behrend
 Fix For: 1.2.3


I updated my Procrun Version from 1.0.15 to 1.2.2.

After Installing and starting the Service, logfiles where created at the 
desired location but stayed empty during run. Loglevel is set to 'debug' like 
in the version before. I rechecked the configuration documentation, but did not 
find any changeable issues.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEOMETRY-71) Investigate Spherical Barycenter Accuracy

2020-01-13 Thread Baljit Singh (Jira)


[ 
https://issues.apache.org/jira/browse/GEOMETRY-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014428#comment-17014428
 ] 

Baljit Singh commented on GEOMETRY-71:
--

[~mattjuntunen] FYI, I tried the the case of polygon with holes (using 
apache-math3). The algorithms still holds!

> Investigate Spherical Barycenter Accuracy
> -
>
> Key: GEOMETRY-71
> URL: https://issues.apache.org/jira/browse/GEOMETRY-71
> Project: Apache Commons Geometry
>  Issue Type: Bug
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current code for computing spherical barycenters in 
> {{ConvexArea2S.getBarycenter()}} seems to suffer from floating point accuracy 
> issues. The {{ConvexArea2STest.checkBarycenterConsistency()}} method checks 
> the consistency of the barycenter computation of a region by splitting the 
> region into two sections, computing the area and barycenter of each section, 
> and then computing the combined barycenter of the sections by adding the 
> barycenter of each scaled by its corresponding area. It is expected that the 
> combined barycenter computed in this way should equal the barycenter computed 
> for the region as a whole. However, in practice, a large epsilon value is 
> needed in the comparison in order for the tests to pass. We need to 
> investigate why this is the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMAGING-246) Invalid Block Size error prevents handling of block 1084, Macintosh NSPrintInfo

2020-01-13 Thread Liberty Wollerman (Jira)


[ 
https://issues.apache.org/jira/browse/IMAGING-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014377#comment-17014377
 ] 

Liberty Wollerman commented on IMAGING-246:
---

Thank you [~kinow] I appreciate your speedy response. Also, thanks for 
reformatting the description. I noticed that the reformatted code block was 
different content than my originally pasted code, so I just updated that in the 
description.

> Invalid Block Size error prevents handling of block 1084, Macintosh 
> NSPrintInfo
> ---
>
> Key: IMAGING-246
> URL: https://issues.apache.org/jira/browse/IMAGING-246
> Project: Commons Imaging
>  Issue Type: Bug
>  Components: Format: JPEG
>Affects Versions: 1.0-alpha1
>Reporter: Liberty Wollerman
>Assignee: Bruno P. Kinoshita
>Priority: Major
> Attachments: FallHarvestKitKat_07610.jpg
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When processing an image created on a Mac with Adobe Photoshop which contains 
> embedded metadata having block 1084, an invalid block size error occurs.
> |0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific 
> info for Macintosh. NSPrintInfo. It is recommened that you do not interpret 
> or use this data.|
>  
> Here is some simple test code that replicates what our application is trying 
> to do, and recreates the error:
> {code:java}
> import org.apache.commons.imaging.ImageInfo;
> import org.apache.commons.imaging.ImageReadException;
> import org.apache.commons.imaging.Imaging;
> import org.apache.commons.io.FileUtils;
> import java.io.File;
> import java.io.IOException;
> import java.util.Base64;
> public class Main {
>  public static void main(String[] args) throws IOException, 
> ImageReadException { 
>   String fileName = "FallHarvestKitKat_07610.jpg";
>   ClassLoader classLoader = ClassLoader.getSystemClassLoader(); 
>   File file = new File(classLoader.getResource(fileName).getFile()); 
>   byte[] fileContent = FileUtils.readFileToByteArray(file); 
>   String encodedString = Base64.getEncoder().encodeToString(fileContent); 
>   byte[] decodedValue = Base64.getDecoder().decode(encodedString); 
>   ImageInfo imageInfo = Imaging.getImageInfo(decodedValue);
>  }
> }{code}
>  
> Here is the resulting error:
> {noformat}
>  Exception in thread "main" org.apache.commons.imaging.ImageReadException: 
> Invalid Block Size : 89562 > 65504Exception in thread "main" 
>  org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 
> 65504 at 
>  
> org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318)
>  at 
>  
> org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119)
>  at 
>  
> org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112)
>  at 
>  
> org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71)
>  at 
>  
> org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599)
>  at 
>  
> org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318)
>  at 
>  
> org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739)
>  at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at 
>  org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:635) at 
> Main.getMetaData(Main.java:22) at Main.main(Main.java:17){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMAGING-246) Invalid Block Size error prevents handling of block 1084, Macintosh NSPrintInfo

2020-01-13 Thread Liberty Wollerman (Jira)


 [ 
https://issues.apache.org/jira/browse/IMAGING-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liberty Wollerman updated IMAGING-246:
--
Description: 
When processing an image created on a Mac with Adobe Photoshop which contains 
embedded metadata having block 1084, an invalid block size error occurs.
|0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info 
for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use 
this data.|

 

Here is some simple test code that replicates what our application is trying to 
do, and recreates the error:
{code:java}
import org.apache.commons.imaging.ImageInfo;
import org.apache.commons.imaging.ImageReadException;
import org.apache.commons.imaging.Imaging;
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.IOException;
import java.util.Base64;
public class Main {
 public static void main(String[] args) throws IOException, ImageReadException 
{ 
  String fileName = "FallHarvestKitKat_07610.jpg";
  ClassLoader classLoader = ClassLoader.getSystemClassLoader(); 
  File file = new File(classLoader.getResource(fileName).getFile()); 
  byte[] fileContent = FileUtils.readFileToByteArray(file); 
  String encodedString = Base64.getEncoder().encodeToString(fileContent); 
  byte[] decodedValue = Base64.getDecoder().decode(encodedString); 
  ImageInfo imageInfo = Imaging.getImageInfo(decodedValue);
 }
}{code}
 

Here is the resulting error:
{noformat}
 Exception in thread "main" org.apache.commons.imaging.ImageReadException: 
Invalid Block Size : 89562 > 65504Exception in thread "main" 
 org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 
65504 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112)
 at 
 
org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739)
 at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at 
 org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:635) at 
Main.getMetaData(Main.java:22) at Main.main(Main.java:17){noformat}
 

  was:
When processing an image created on a Mac with Adobe Photoshop which contains 
embedded metadata having block 1084, an invalid block size error occurs.
|0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info 
for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use 
this data.|

 

Here is some simple test code that replicates what our application is trying to 
do, and recreates the error:
{code:java}
import org.apache.commons.imaging.ImageInfo;
import org.apache.commons.imaging.ImageReadException;
import org.apache.commons.imaging.Imaging;
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.IOException;
import java.util.Base64;
public class Main {
 public static void main(String[] args) throws IOException, ImageReadException 
{ 
  String fileName = "FallHarvestKitKat_07610.jpg"; ClassLoader classLoader = 
ClassLoader.getSystemClassLoader(); 
  File file = new File(classLoader.getResource(fileName).getFile()); 
  byte[] fileContent = FileUtils.readFileToByteArray(file); 
  String encodedString = Base64.getEncoder().encodeToString(fileContent); 
  byte[] decodedValue = Base64.getDecoder().decode(encodedString); 
  ImageInfo imageInfo = Imaging.getImageInfo(decodedValue);
 }
}{code}
 

Here is the resulting error:
{noformat}
 Exception in thread "main" org.apache.commons.imaging.ImageReadException: 
Invalid Block Size : 89562 > 65504Exception in thread "main" 
 org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 
65504 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112)
 at 
 
org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739)
 at 

[jira] [Updated] (IMAGING-246) Invalid Block Size error prevents handling of block 1084, Macintosh NSPrintInfo

2020-01-13 Thread Liberty Wollerman (Jira)


 [ 
https://issues.apache.org/jira/browse/IMAGING-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liberty Wollerman updated IMAGING-246:
--
Description: 
When processing an image created on a Mac with Adobe Photoshop which contains 
embedded metadata having block 1084, an invalid block size error occurs.
|0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info 
for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use 
this data.|

 

Here is some simple test code that replicates what our application is trying to 
do, and recreates the error:
{code:java}
import org.apache.commons.imaging.ImageInfo;
import org.apache.commons.imaging.ImageReadException;
import org.apache.commons.imaging.Imaging;
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.IOException;
import java.util.Base64;
public class Main {
 public static void main(String[] args) throws IOException, ImageReadException 
{ 
  String fileName = "FallHarvestKitKat_07610.jpg"; ClassLoader classLoader = 
ClassLoader.getSystemClassLoader(); 
  File file = new File(classLoader.getResource(fileName).getFile()); 
  byte[] fileContent = FileUtils.readFileToByteArray(file); 
  String encodedString = Base64.getEncoder().encodeToString(fileContent); 
  byte[] decodedValue = Base64.getDecoder().decode(encodedString); 
  ImageInfo imageInfo = Imaging.getImageInfo(decodedValue);
 }
}{code}
 

Here is the resulting error:
{noformat}
 Exception in thread "main" org.apache.commons.imaging.ImageReadException: 
Invalid Block Size : 89562 > 65504Exception in thread "main" 
 org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 
65504 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112)
 at 
 
org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739)
 at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at 
 org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:635) at 
Main.getMetaData(Main.java:22) at Main.main(Main.java:17){noformat}
 

  was:
When processing an image created on a Mac with Adobe Photoshop which contains 
embedded metadata having block 1084, an invalid block size error occurs.
|0x043C|1084|_(Photoshop CS5)_ Macintosh NSPrintInfo. Variable OS specific info 
for Macintosh. NSPrintInfo. It is recommened that you do not interpret or use 
this data.|

 

Here is some simple test code that replicates what our application is trying to 
do, and recreates the error:
{code:java}
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {
public static void main(String args[]) {// String to be scanned to 
find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";// Create a Pattern object
Pattern r = Pattern.compile(pattern);// Now create matcher 
object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
}{code}
 

Here is the resulting error:
{noformat}
 Exception in thread "main" org.apache.commons.imaging.ImageReadException: 
Invalid Block Size : 89562 > 65504Exception in thread "main" 
 org.apache.commons.imaging.ImageReadException: Invalid Block Size : 89562 > 
65504 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parseAllBlocks(IptcParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:119)
 at 
 
org.apache.commons.imaging.formats.jpeg.iptc.IptcParser.parsePhotoshopSegment(IptcParser.java:112)
 at 
 
org.apache.commons.imaging.formats.jpeg.segments.App13Segment.parsePhotoshopSegment(App13Segment.java:71)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getPhotoshopMetadata(JpegImageParser.java:599)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getMetadata(JpegImageParser.java:318)
 at 
 
org.apache.commons.imaging.formats.jpeg.JpegImageParser.getImageInfo(JpegImageParser.java:739)
 at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:701) at 
 

[GitHub] [commons-csv] kensixx commented on issue #43: CSVFormat#valiadte() does not account for allowDuplicateHeaderNames

2020-01-13 Thread GitBox
kensixx commented on issue #43: CSVFormat#valiadte() does not account for 
allowDuplicateHeaderNames
URL: https://github.com/apache/commons-csv/pull/43#issuecomment-573559810
 
 
   Hi, and good day @garydgregory , any updates on ver 1.8 release? =)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (JEXL-318) Annotation processing may fail in lexical mode

2020-01-13 Thread Dmitri Blinov (Jira)
Dmitri Blinov created JEXL-318:
--

 Summary: Annotation processing may fail in lexical mode
 Key: JEXL-318
 URL: https://issues.apache.org/jira/browse/JEXL-318
 Project: Commons JEXL
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Dmitri Blinov


I fave found that the annotation processing under certain conditions may lead 
to NPE
{code:java}
public static class OptAnnotationContext extends JexlEvalContext implements 
JexlContext.AnnotationProcessor {
@Override
public Object processAnnotation(String name, Object[] args, 
Callable statement) throws Exception {
JexlOptions options = this.getEngineOptions();
// transient side effect for strict
if ("scale".equals(name)) {
int scale = options.getMathScale();
int newScale = (Integer) args[0];
options.setMathScale(newScale);
try {
return statement.call();
} finally {
options.setMathScale(scale);
}
}
return statement.call();
}
}
@Test
public void testAnnotation() throws Exception {
JexlFeatures f = new JexlFeatures();
f.lexical(true);
JexlEngine jexl = new JexlBuilder().strict(true).features(f).create();
JexlScript script = jexl.createScript("@scale(13) @test var i = 42");
JexlContext jc = new OptAnnotationContext();
Object result = script.execute(jc);
Assert.assertEquals(result, 42);
}
 {code}
This is because new instance of Interpeter is created to process annotation 
under certain conditions, and this new instance does not inherit the current 
lexical block. Furthermore, the constructor of InterperterBase 
{{InterpreterBase(InterpreterBase ii, JexlArithmetic jexla)}} now silently 
ignores JexlArithmetic passed to it, which is possibly another bug.

As a suggestion, can we refactor the code to simply make JexlArithmetic non 
final in InterpreterBase? There would be no need to create new instance of 
Interpeter and complicate code with sync-state code?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)