On Thu, 31 Dec 2020 10:10:56 GMT, Philippe Marschall 
<github.com+471021+marsch...@openjdk.org> wrote:

> Implement three optimiztations for Reader.read(CharBuffer)
> 
> * Add a code path for heap buffers in Reader#read to use the backing array 
> instead of allocating a new one.
> * Change the code path for direct buffers in Reader#read to limit the 
> intermediate allocation to `TRANSFER_BUFFER_SIZE`.
> * Implement `InputStreamReader#read(CharBuffer)` and delegate to 
> `StreamDecoder`.
> * Implement `StreamDecoder#read(CharBuffer)` and avoid buffer allocation.

A couple of implementation notes:

Reader#read(CharBuffer)

on-heap case

I introduced a dedicated path for the on-heap case and directly read into the 
backing array. This completely avoids the intermediate allocation and copy. 
Compared to the initial proposal the buffer position is updated. This has to be 
done because we bypass the buffer and directly read into the array. This also 
handles the case that #read returns -1.

I am using a c-style array declaration because the rest of the class uses it.

off-heap case

I limit the intermadiate allocation to TRANSFER_BUFFER_SIZE. In order to reduce 
the total intermediate allocation we now call #read multiple times until the 
buffer is full or EOF is reached. This changes the behavior slightly as now 
possibly more data is read. However this should contribute to reduce the 
overall intermediate allocations.

A lock is acquired to keep the the read atomic. This is consistent with #skip 
which also holds a lock over multiple #read calls. This is inconsistent with 
#transferTo which does not acquire a lock over multiple #read calls. 

The implementation took inspiration from java.io.InputStream#readNBytes(int).

InputStreamReader#read(CharBuffer)

Since StreamDecoder is a Reader as well we can simply delegate.

StreamDecoder#read(CharBuffer)

Interestingly this was not implemented even though StreamDecoder internally 
works on NIO buffers.

on-heap case

We see a performance improvement and the elimination of all intermediate 
allocation.

StreamDecoder#read(char[], int, int) and InputStreamReader#read(char[], int, 
int) may get a small improvement as we no longer #slice the buffer.

off-heap case

We see the elimination of all intermediate allocation but a performance penalty 
because we hit the slow path in #decodeLoop. There is a trade-off here, we 
could improve the performance to previous levels by introducing intermediate 
allocation again. I don't know how much the users of off-heap buffers care 
about introducing intermediate allocation vs maximum throughput.

I was struggling to come up with microbenchmarks because the built in 
InputStream and Reader implementations are finite but JMH calls the benchmark 
methods repeatably. I ended up implementing custom infinite InputStream and 
Reader implementations. The code can be found there:

https://github.com/marschall/reader-benchmarks/tree/master/src/main/java/com/github/marschall/readerbenchmarks

An Excel with charts can be found here:

https://github.com/marschall/reader-benchmarks/raw/master/src/main/resources/4926314.xlsx

I looked for java.io.Reader#read(CharBuffer) users in the JDK and only found 
java.util.Scanner#readInput(). I wrote a microbenchmark for this but only found 
minuscule improvements due to regex dominating the runtime.

There seem to be no direct Reader tests in the tier1 suite, the initial code 
was wrong because I forgot to update the buffer code position but I only found 
out because some Scanner tests failed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1915

Reply via email to