[ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=370948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-370948
 ]

ASF GitHub Bot logged work on IO-649:
-------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jan/20 17:48
            Start Date: 13/Jan/20 17:48
    Worklog Time Spent: 10m 
      Work Description: brettlounsbury commented on pull request #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101
 
 
   This change modifies the contentEquals() methods to internally buffer 
content into a byte/char array and to then do batch comparisons of those arrays 
using Arrays.equals instead of using a BufferedInputStream or BufferedReader 
and making use of the single byte/char read() methods.
   
   This reduces the number of method invocations by a factor equal to the 
buffer size and avoids casting every byte read to an int and improves 
performance significantly.
   
   The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O).  
This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
was a forced System.gc() between each iteration to avoid GC as a source of 
latency:
   
   Average: 7236 to 858ms (8.43x speedup)
   P50: 7224 to 856ms (8.44x speedup)
   P90: 7249 to 860ms (8.43x speedup)
   P99: 7410 to 913ms (8.12x speedup)
   P100: 8330 to 1278ms (6.52x speedup)
   
   The following table shows the performance increase over 1000 iterations of 
comparing 2 1GB Reader of character data (stored in memory to avoid I/O).  This 
test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a 
forced System.gc() between each iteration to avoid GC as a source of latency:
   
   Average: 11281 to 1737ms (6.50x speedup)
   P50: 11262 to 1735ms (6.49x speedup)
   P90: 11292 to 1741ms (6.49x speedup)
   P99: 11707 to 1774ms (6.60x speedup)
   P100: 12176 to 1884ms (6.46x speedup)
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 370948)
    Remaining Estimate: 0h
            Time Spent: 10m

> IOUtils contentEquals method performance improvements
> -----------------------------------------------------
>
>                 Key: IO-649
>                 URL: https://issues.apache.org/jira/browse/IO-649
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.0, 1.1
>            Reporter: Brett Lounsbury
>            Priority: Major
>             Fix For: 2.6
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to