Brett Lounsbury created IO-649:
----------------------------------

             Summary: IOUtils contentEquals method performance improvements
                 Key: IO-649
                 URL: https://issues.apache.org/jira/browse/IO-649
             Project: Commons IO
          Issue Type: Improvement
          Components: Utilities
    Affects Versions: 1.1, 1.0
            Reporter: Brett Lounsbury
             Fix For: 2.6


 

contentEquals() internally wraps any given InputStream/Reader in a Buffered 
version (if it is not already buffered) which avoids a lot of IO penalties, but 
then it proceeds to read each byte/character one at a time.  This leads to 
significantly more method calls and also a lot of byte -> int casting since the 
read() method returns an int between 0 and 255 instead of returning a byte.

 

I have a change that modifies the contentEquals() methods to internally buffer 
content into a byte/char array and to then do batch comparisons of those arrays 
using Arrays.equals instead of using a BufferedInputStream or BufferedReader 
and making use of the single byte/char read() methods.  This reduces the number 
of method invocations by a factor equal to the buffer size and avoids casting 
every byte read to an int.

 

The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
was a forced System.gc() between each iteration to avoid GC as a source of 
latency:

Average: 7236 to 858ms (8.43x speedup)
P50: 7224 to 856ms (8.44x speedup)
P90: 7249 to 860ms (8.43x speedup)
P99: 7410 to 913ms (8.12x speedup)
P100: 8330 to 1278ms (6.52x speedup)

 

The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This 
test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a 
forced System.gc() between each iteration to avoid GC as a source of latency:

Average: 11281 to 1737ms (6.50x speedup)
P50: 11262 to 1735ms (6.49x speedup)
P90: 11292 to 1741ms (6.49x speedup)
P99: 11707 to 1774ms (6.60x speedup)
P100: 12176 to 1884ms (6.46x speedup)

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to