[ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=371020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371020
 ]

ASF GitHub Bot logged work on IO-649:
-------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jan/20 19:30
            Start Date: 13/Jan/20 19:30
    Worklog Time Spent: 10m 
      Work Description: michael-o commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#discussion_r365985566
 
 

 ##########
 File path: src/main/java/org/apache/commons/io/IOUtils.java
 ##########
 @@ -746,25 +792,74 @@ public static boolean contentEquals(final InputStream 
input1, final InputStream
     @SuppressWarnings("resource")
     public static boolean contentEquals(final Reader input1, final Reader 
input2)
             throws IOException {
+        return contentEquals(input1, input2, DEFAULT_BUFFER_SIZE);
+    }
+
+    /**
+     * Compares the contents of two Readers to determine if they are equal or 
not.
+     * <p>
+     * This method buffers the input internally.
+     * </p>
+     *
+     * @param input1 the first reader
+     * @param input2 the second reader
+     * @param bufferSize the size of the internal buffer to use.
+     * @return true if the content of the readers are equal or they both don't
+     * exist, false otherwise
+     * @throws NullPointerException if either input is null
 
 Review comment:
   Same here.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 371020)
    Time Spent: 40m  (was: 0.5h)

> IOUtils contentEquals method performance improvements
> -----------------------------------------------------
>
>                 Key: IO-649
>                 URL: https://issues.apache.org/jira/browse/IO-649
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.0, 1.1
>            Reporter: Brett Lounsbury
>            Priority: Major
>             Fix For: 2.6
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to