On Thu, 20 Jul 2023 17:27:58 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> 
wrote:

> Is there any benchmark for DataInput/Output stream that can be used? I mean, 
> it would be interesting to understand how these numbers translate when 
> running the stuff that is built on top.

I've tried to run the benchmark in test/micro/java/io/DataInputStream.java. 
This is the baseline:


Benchmark                     Mode  Cnt  Score   Error  Units
DataInputStreamTest.readChar  avgt   20  7.583 ± 0.026  us/op
DataInputStreamTest.readInt   avgt   20  3.804 ± 0.045  us/op


And this is with a patch similar to the one I shared above, to use ByteBuffer 
internally:


Benchmark                     Mode  Cnt  Score   Error  Units
DataInputStreamTest.readChar  avgt   20  7.594 ± 0.106  us/op
DataInputStreamTest.readInt   avgt   20  3.795 ± 0.030  us/op


There does not seem to be any extra overhead. That said, access occurs in a 
counted loop, and in these cases we know buffer/segment access is optimized 
quite well.

I believe the question here is: do we have benchmark which are representative 
of the kind of gain that would be introduced by micro-optimizing ByteArray? It 
can be quite tricky to estimate real benefits from synthetic benchmark on the 
ByteArray class, especially when fetching a single element outside of a loop - 
as those are not representative of how the clients will use this. I note that 
the original benchmark made by Per used a loop with two iterations to assess 
the cost of the ByteArray operations:

http://minborgsjavapot.blogspot.com/2023/01/java-21-performance-improvements.html

If I change the benchmark to do 2 iterations, I see this:


Benchmark                      Mode  Cnt       Score       Error   Units
ByteArray.readByte            thrpt    5  704199.172 ± 34101.508  ops/ms
ByteArray.readByteFromBuffer  thrpt    5  474321.828 ±  6588.471  ops/ms
ByteArray.readInt             thrpt    5  662411.181 ±  4470.951  ops/ms
ByteArray.readIntFromBuffer   thrpt    5  496900.429 ±  3705.737  ops/ms
ByteArray.readLong            thrpt    5  665138.063 ±  5944.814  ops/ms
ByteArray.readLongFromBuffer  thrpt    5  517781.548 ± 27106.331  ops/ms

The more the iterations, the less the cost (and you don't need many iterations 
to break even). This probably explains why the DataInputStream benchmark 
doesn't change - there's 1024 iterations in there.

I guess all this is to say that excessively focussing on microbenchmark of a 
simple class such as ByteArray in conditions that are likely unrealistic (e.g. 
single access) is IMHO the wrong way to look at things, as ByteArray is mostly 
used by classes that most definitively will read more than one value at a time 
(including classfile API). 

So, also IMHO, we should try to measure the use cases we care about of the 
higher-level API we care about (I/O streams, classfile) and then see if adding 
Unsafe/VarHandle/ByteBuffer access in here is going to lead to any benefit at 
all.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14636#discussion_r1269993992

Reply via email to