Brett Lounsbury created IO-649: ---------------------------------- Summary: IOUtils contentEquals method performance improvements Key: IO-649 URL: https://issues.apache.org/jira/browse/IO-649 Project: Commons IO Issue Type: Improvement Components: Utilities Affects Versions: 1.1, 1.0 Reporter: Brett Lounsbury Fix For: 2.6
contentEquals() internally wraps any given InputStream/Reader in a Buffered version (if it is not already buffered) which avoids a lot of IO penalties, but then it proceeds to read each byte/character one at a time. This leads to significantly more method calls and also a lot of byte -> int casting since the read() method returns an int between 0 and 255 instead of returning a byte. I have a change that modifies the contentEquals() methods to internally buffer content into a byte/char array and to then do batch comparisons of those arrays using Arrays.equals instead of using a BufferedInputStream or BufferedReader and making use of the single byte/char read() methods. This reduces the number of method invocations by a factor equal to the buffer size and avoids casting every byte read to an int. The following table shows the performance increase over 1000 iterations of comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 7236 to 858ms (8.43x speedup) P50: 7224 to 856ms (8.44x speedup) P90: 7249 to 860ms (8.43x speedup) P99: 7410 to 913ms (8.12x speedup) P100: 8330 to 1278ms (6.52x speedup) The following table shows the performance increase over 1000 iterations of comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency: Average: 11281 to 1737ms (6.50x speedup) P50: 11262 to 1735ms (6.49x speedup) P90: 11292 to 1741ms (6.49x speedup) P99: 11707 to 1774ms (6.60x speedup) P100: 12176 to 1884ms (6.46x speedup) -- This message was sent by Atlassian Jira (v8.3.4#803005)