Hi, For the case of reading 2^N bytes i believe you can avoid doing a last copy by checking if “n < 0" within the “nread > 0” block when “nread == DEAFULT_BUFFER_SIZE”. That might close the perf gap for smaller cases. You can also move "nread = 0” to the same block e.g.:
var copy = (n < 0 && nread == DEAFULT_BUFFER_SIZE) ? buf : Arrays.copyOf(buf, nread); list.add(copy) nread = 0; 262 byte[] output = new byte[total]; 263 int offset = 0; 264 int numCached = list.size(); 265 for (int i = 0; i < numCached; i++) { 266 byte[] b = list.get(i); 267 System.arraycopy(b, 0, output, offset, b.length); 268 offset += b.length; 269 } You can simplify to: var result = new byte[total]; int offset = 0; for (buf : list) { System.arraycopy(buf, 0, result, offset, buf.length); offset += buf.length; } s/list/bufs and then you can use var for the declarations at the start of the method. Paul. > On 19 Dec 2017, at 11:57, Brian Burkhalter <brian.burkhal...@oracle.com> > wrote: > > https://bugs.openjdk.java.net/browse/JDK-8193832 > http://cr.openjdk.java.net/~bpb/8193832/webrev.00/ > > The implementation of InputStream.readAllBytes() can be modified to be in > general faster and to require less memory. The current implementation starts > with an internal buffer of size 8192 bytes and doubles the size of the buffer > each time more capacity is required resulting in a geometric progression in > the buffer size. At the end if the buffer size is not exactly equal to the > total number of bytes read, then another allocation equal to the total number > read is performed. The amount of memory N required can be calculated for > initial buffer size B and total bytes read L as > > M for L == B*(2^n) > N = > M + L for L != B*(2^n) > > where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)). > > An alternative implementation is not to increase the size of the internal > buffer each time more capacity is needed, but to instead maintain a list of > buffers of up to B bytes each and gather those into an output buffer at the > end. If there is only a single buffer in the list then gathering is not > needed and it can be returned directly. The amount of memory N required in > this case is > > B + L for L <= B > N = > B + 2*L for L > B > > A comparison of memory usage for a number of lengths L with a buffer size B > of 8192 is: > > L N_old N_new > 8191 16383 16383 > 8192 8192 16384 > 8193 32769 24578 > 16383 40959 40958 > 16384 24576 40960 > 16385 73729 40962 > 32767 90111 73726 > 32768 57344 73728 > 32769 155649 73730 > 65535 188415 139262 > 65536 122880 139264 > 65537 319489 139266 > 131071 385023 270334 > 131072 253952 270336 > 131073 647169 270338 > 262143 778239 532478 > 262144 516096 532480 > 262145 1302529 532482 > > In general if the size of the last intermediate buffer in the old version > does not equal the total number of bytes read, then the old version will > require more memory thaw the new. The performance for the same set of lengths > was measured using JMH: > > Benchmark (base) (offset) Mode Cnt > Score Error Units > BenchReadAllBytesParam.readAllBytes 8192 -1 thrpt 5 > 578064.253 ± 18955.667 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 8192 -1 thrpt 5 > 530963.634 ± 4799.923 ops/s > BenchReadAllBytesParam.readAllBytes 8192 0 thrpt 5 > 1414243.593 ± 68520.245 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 8192 0 thrpt 5 > 532019.149 ± 7051.738 ops/s > BenchReadAllBytesParam.readAllBytes 8192 1 thrpt 5 > 293586.365 ± 8196.939 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 8192 1 thrpt 5 > 361682.265 ± 27081.111 ops/s > BenchReadAllBytesParam.readAllBytes 16384 -1 thrpt 5 > 216809.517 ± 4853.852 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 16384 -1 thrpt 5 > 212346.244 ± 2980.916 ops/s > BenchReadAllBytesParam.readAllBytes 16384 0 thrpt 5 > 379757.470 ± 14524.249 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 16384 0 thrpt 5 > 218802.256 ± 4361.080 ops/s > BenchReadAllBytesParam.readAllBytes 16384 1 thrpt 5 > 123570.712 ± 1665.661 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 16384 1 thrpt 5 > 208801.577 ± 8005.196 ops/s > BenchReadAllBytesParam.readAllBytes 32768 -1 thrpt 5 > 93083.146 ± 1024.309 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 32768 -1 thrpt 5 > 104398.304 ± 1310.509 ops/s > BenchReadAllBytesParam.readAllBytes 32768 0 thrpt 5 > 151104.878 ± 3461.359 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 32768 0 thrpt 5 > 104849.088 ± 3841.224 ops/s > BenchReadAllBytesParam.readAllBytes 32768 1 thrpt 5 > 58039.827 ± 398.908 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 32768 1 thrpt 5 > 104489.685 ± 2118.496 ops/s > BenchReadAllBytesParam.readAllBytes 65536 -1 thrpt 5 > 43144.695 ± 440.349 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 65536 -1 thrpt 5 > 55583.338 ± 2115.162 ops/s > BenchReadAllBytesParam.readAllBytes 65536 0 thrpt 5 > 67216.536 ± 2055.057 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 65536 0 thrpt 5 > 55680.238 ± 1235.571 ops/s > BenchReadAllBytesParam.readAllBytes 65536 1 thrpt 5 > 27908.000 ± 568.473 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 65536 1 thrpt 5 > 55938.779 ± 813.991 ops/s > BenchReadAllBytesParam.readAllBytes 131072 -1 thrpt 5 > 20299.014 ± 568.616 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 131072 -1 thrpt 5 > 28115.036 ± 842.392 ops/s > BenchReadAllBytesParam.readAllBytes 131072 0 thrpt 5 > 31617.475 ± 467.378 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 131072 0 thrpt 5 > 28274.289 ± 259.699 ops/s > BenchReadAllBytesParam.readAllBytes 131072 1 thrpt 5 > 13640.406 ± 303.125 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 131072 1 thrpt 5 > 28221.298 ± 515.030 ops/s > BenchReadAllBytesParam.readAllBytes 262144 -1 thrpt 5 > 9995.183 ± 249.368 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 262144 -1 thrpt 5 > 14043.194 ± 138.026 ops/s > BenchReadAllBytesParam.readAllBytes 262144 0 thrpt 5 > 15309.238 ± 353.752 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 262144 0 thrpt 5 > 14048.569 ± 348.699 ops/s > BenchReadAllBytesParam.readAllBytes 262144 1 thrpt 5 > 5940.442 ± 206.855 ops/s > BenchReadAllBytesParam.readAllBytesArrayList 262144 1 thrpt 5 > 14055.357 ± 412.470 ops/s > > In each case the number of bytes read is the sum of the base and offset > parameters. Similar behavior with respect to data length is observed for > performance as for memory usage: if the data length is not equal to the size > of the last intermediate buffer used, then the performance of the old version > is in general worse than that of the new. > > Thanks, > > Brian