On Tue, 5 Jan 2021 00:48:54 GMT, Brian Burkhalter <b...@openjdk.org> wrote:
>> A couple of implementation notes: >> >> Reader#read(CharBuffer) >> >> on-heap case >> >> I introduced a dedicated path for the on-heap case and directly read into >> the backing array. This completely avoids the intermediate allocation and >> copy. Compared to the initial proposal the buffer position is updated. This >> has to be done because we bypass the buffer and directly read into the >> array. This also handles the case that #read returns -1. >> >> I am using a c-style array declaration because the rest of the class uses it. >> >> off-heap case >> >> I limit the intermadiate allocation to TRANSFER_BUFFER_SIZE. In order to >> reduce the total intermediate allocation we now call #read multiple times >> until the buffer is full or EOF is reached. This changes the behavior >> slightly as now possibly more data is read. However this should contribute >> to reduce the overall intermediate allocations. >> >> A lock is acquired to keep the the read atomic. This is consistent with >> #skip which also holds a lock over multiple #read calls. This is >> inconsistent with #transferTo which does not acquire a lock over multiple >> #read calls. >> >> The implementation took inspiration from java.io.InputStream#readNBytes(int). >> >> InputStreamReader#read(CharBuffer) >> >> Since StreamDecoder is a Reader as well we can simply delegate. >> >> StreamDecoder#read(CharBuffer) >> >> Interestingly this was not implemented even though StreamDecoder internally >> works on NIO buffers. >> >> on-heap case >> >> We see a performance improvement and the elimination of all intermediate >> allocation. >> >> StreamDecoder#read(char[], int, int) and InputStreamReader#read(char[], int, >> int) may get a small improvement as we no longer #slice the buffer. >> >> off-heap case >> >> We see the elimination of all intermediate allocation but a performance >> penalty because we hit the slow path in #decodeLoop. There is a trade-off >> here, we could improve the performance to previous levels by introducing >> intermediate allocation again. I don't know how much the users of off-heap >> buffers care about introducing intermediate allocation vs maximum throughput. >> >> I was struggling to come up with microbenchmarks because the built in >> InputStream and Reader implementations are finite but JMH calls the >> benchmark methods repeatably. I ended up implementing custom infinite >> InputStream and Reader implementations. The code can be found there: >> >> https://github.com/marschall/reader-benchmarks/tree/master/src/main/java/com/github/marschall/readerbenchmarks >> >> An Excel with charts can be found here: >> >> https://github.com/marschall/reader-benchmarks/raw/master/src/main/resources/4926314.xlsx >> >> I looked for java.io.Reader#read(CharBuffer) users in the JDK and only found >> java.util.Scanner#readInput(). I wrote a microbenchmark for this but only >> found minuscule improvements due to regex dominating the runtime. >> >> There seem to be no direct Reader tests in the tier1 suite, the initial code >> was wrong because I forgot to update the buffer code position but I only >> found out because some Scanner tests failed. > > I changed the JBS issue summary to match the title of this PR so that > integration blocker is removed. > > How does the off-heap performance of `InputStreamReader.read(CharBuffer)` > compare for the case where only the change to `Reader` is made versus if all > your proposed changes are applied? > > Some kind of specific test should likely be included. > > Note that the more recent copyright year is now 2021. I think the implementation changes here look good. I don't know however whether there is enough coverage in the tests. These should verify that the `Reader`, `CharArrayReader`, and `InputStreamReader` implementations of `read(CharBuffer)` are accurate. If there is already sufficient coverage in the tests in `test/jdk/java/io` then that is good enough and nothing need be added. ------------- PR: https://git.openjdk.java.net/jdk/pull/1915