Roger,
thank you for this helpful hint!
As it is (unfortunately) not possible to ask a CharsetDecoder whether it
currently caches some bytes or not, I simply added some explicit flag
for that in the read(...) methods to switch between two optimized code
paths: The "fast path" and the "slow path":
@Override
public String readAllAsString() throws IOException {
synchronized (lock) {
ensureOpen();
if (in == null)
return super.readAllAsString();
byte[] remaining = in.readAllBytes();
if (!havePulledFromInputStream)
return new String(remaining, cs); // fast-path
int estimateSize = (haveLeftoverChar ? 1 : 0) + (int)
Math.ceil((bb.remaining() + remaining.length) * decoder.maxCharsPerByte());
int initialSize = Math.max(estimateSize,
DEFAULT_BYTE_BUFFER_SIZE);
CharBuffer cb = CharBuffer.allocate(initialSize);
if (haveLeftoverChar) {
cb.put(leftoverChar);
haveLeftoverChar = false;
}
while (bb.hasRemaining()) {
CoderResult cr = decoder.decode(bb, cb, false);
if (cr.isError())
cr.throwException();
if (cr.isOverflow())
cb = ensureFree(cb, bb.remaining());
}
ByteBuffer bbuf = ByteBuffer.wrap(remaining);
while (bbuf.hasRemaining()) {
CoderResult cr = decoder.decode(bbuf, cb, false);
if (cr.isError())
cr.throwException();
if (cr.isOverflow())
cb = ensureFree(cb, bbuf.remaining());
}
return cb.flip().toString();
}
}
I gave both new code paths several tries using JMH, and the results are
rather impressing when compared to the original code (i. e. the default
implementation found in Reader::readAllAsString):
* The "fast path" is approx. 40% faster than the original code!
* The "slow path" is approx. 20% faster than the original code!
I think it is worth adopting this solution to JDK 27 and would like to
publish a PR for reviews.
-Markus
Am 24.03.2026 um 19:23 schrieb Roger Riggs:
Hi,
That looks like a fairly localized change.
It will need to deal with the risk that the StreamDecoder has cached
byte(s) that have already been pulled from the input stream and not
yet been flushed through, say for example a byte of a multi-byte
character.
This might happen if some reads have been done before calling
readAllAsString.
Regards, Roger
On 3/22/26 6:43 AM, Markus KARG wrote:
Dear Core-Lib Developers,
I like to contribute an explicit implementation of
InputStreamReader.readAllAsString() for performance reasons.
I benchmarked InputStreamReader.readAllAsString() vs simplistic
custom code, and the result is pretty clear: Even most trivial custom
code clearly outperforms the original code by more than 30%!
Benchmark Mode Cnt Score Error Units
Demo.jdk thrpt 25 10931.806 ± 535.177 ops/s
Demo.customCode thrpt 25 14775.102 ± 343.829 ops/s
@Benchmark
public String jdk(MyState ms) throws IOException {
return new InputStreamReader(ms.in, UTF_8).readAllAsString();
}
@Benchmark
public String trivialCustomCode(MyState ms) throws IOException {
return new String(ms.in.readAllBytes(), UTF_8);
}
IMHO gaining 30%+ justifies the provision of a custom implementation
like the following one...
public String readAllAsString() throws IOException {
return sd.readAllAsString();
}
and in turn let sd.readAllAsString() return e. g. new
String(in.readAllBytes(), cs)
...to override Reader's default implementation which currently
performs (possibly lots of) several in-builder buffer copies:
public String readAllAsString() throws IOException {
StringBuilder result = new StringBuilder();
char[] cbuf = new char[TRANSFER_BUFFER_SIZE];
int nread;
while ((nread = read(cbuf, 0, cbuf.length)) != -1) {
result.append(cbuf, 0, nread);
}
return result.toString();
}
This minimal change should provide approx. the benchmarked 30%
performance gain, should never be slower, and should not need more
memory than the original implementation.
If the core-libs team is fine with me doing so, I would like to file
a JBS and a PR. Comments very welcome!
-Markus