[ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=377646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377646
 ]

ASF GitHub Bot logged work on IO-649:
-------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jan/20 14:12
            Start Date: 27/Jan/20 14:12
    Worklog Time Spent: 10m 
      Work Description: brettlounsbury commented on issue #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-578764319
 
 
   Hi @garydgregory,
   
   I don't see any comments from you other than the request for a benchmark. 
   
   I have updated IOUtils to include a test dependency on JMH and and to 
include a benchmark that shows BufferedInputStream.read() performance compared 
to InputStream.read(byte[]) performance.  This test creates an 8MB file on disk 
once before the benchmark is ran and reads it back in.  It then just keeps a 
running sum of the bytes in the file (and ignores any overflow/underflow) and 
blackhole's the sum at the end to avoid any compiler optimization.
   
   Results look something like this (Multi-byte reads are about 3.5x faster):
   Benchmark                                                        Mode  Cnt   
      Score        Error  Units
   SingleByteReadVsMultiByteReadPerformanceTest.testReadMultiBytes  avgt   25   
6220618.290 ± 204562.812  ns/op
   SingleByteReadVsMultiByteReadPerformanceTest.testReadSingleByte  avgt   25  
21740409.146 ± 711294.924  ns/op
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 377646)
    Time Spent: 2h 50m  (was: 2h 40m)

> IOUtils contentEquals method performance improvements
> -----------------------------------------------------
>
>                 Key: IO-649
>                 URL: https://issues.apache.org/jira/browse/IO-649
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.0, 1.1
>            Reporter: Brett Lounsbury
>            Priority: Major
>             Fix For: 2.6
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to