Including build-dev since this patch is adding new issue 8248135:
https://bugs.openjdk.java.net/browse/JDK-8248135
So here's new webrev with a patch for building benchmarks with
--enable-preview included:
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.08/
Regards, Peter
On 6/23/20 10:23 AM, Claes Redestad wrote:
On 2020-06-23 10:06, Claes Redestad wrote:
Hi,
On 2020-06-23 09:49, Peter Levart wrote:
Hi Chris, Claes,
Ok then, here's with benchmark included:
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/
I haven't been able to run the benchmark with "make test" though. I
have tried various ways to pass javac options to build like:
make test
TEST='micro:org.openjdk.bench.java.io.RecordDeserialization'
TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"
...but javac doesn't seem to get them. Is there some secret option
to achieve that?
Hmm, we might as well have the microbenchmarks build with
--enable-preview on by default. Try this:
Fixed:
diff -r f2e1cd498381 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:08:35 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:33:17 2020 +0200
@@ -90,10 +90,11 @@
TARGET_RELEASE := $(TARGET_RELEASE_NEWJDK_UPGRADED), \
SMALL_JAVA := false, \
CLASSPATH := $(MICROBENCHMARK_CLASSPATH), \
- DISABLED_WARNINGS := processing rawtypes cast serial, \
+ DISABLED_WARNINGS := processing rawtypes cast serial preview, \
SRC := $(MICROBENCHMARK_SRC), \
BIN := $(MICROBENCHMARK_CLASSES), \
JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules
java.management, \
+ JAVAC_FLAGS := --enable-preview, \
))
$(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)
I verified this works with your micro, and doesn't seem to cause
any issues elsewhere:
make test TEST=micro:RecordDeserialization
I can shepherd this as a separate fix for documentation purposes, but
feel free to include it in your patch and ping build-dev@..
/Claes
diff -r 52741f85bf23 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:54:42 2020
+0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:59:29 2020
+0200
@@ -93,7 +93,7 @@
DISABLED_WARNINGS := processing rawtypes cast serial, \
SRC := $(MICROBENCHMARK_SRC), \
BIN := $(MICROBENCHMARK_CLASSES), \
- JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules
java.management, \
+ JAVA_FLAGS := --enable-preview --add-modules jdk.unsupported
--limit-modules java.management, \
))
$(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)
Otherwise, I simulated what would happen when there are more then 10
ObjectStreamClass layouts for same class rapidly interchanging, so
that they push each other out of the cache, by temporarily setting
cache's MAX_SIZE = 0. Note that this is worst case scenario:
Benchmark (length) Mode Cnt
Score Error Units
RecordDeserializationBench.deserializeClasses 10 avgt 10
9.393 ± 0.287 us/op
RecordDeserializationBench.deserializeClasses 100 avgt 10
35.642 ± 0.977 us/op
RecordDeserializationBench.deserializeClasses 1000 avgt 10
293.769 ± 7.321 us/op
RecordDeserializationBench.deserializeRecords 10 avgt 10
15.335 ± 0.496 us/op
RecordDeserializationBench.deserializeRecords 100 avgt 10
211.427 ± 11.908 us/op
RecordDeserializationBench.deserializeRecords 1000 avgt 10
990.398 ± 26.681 us/op
This is using JMH option '-gc true' to force GC after each iteration
of benchmark. Without it, I get a big (~4s) full-GC pause just in
the middle of a run with length=100:
Iteration 1: 528.577 us/op
Iteration 2: 580.404 us/op
Iteration 3: 4438.228 us/op
Iteration 4: 644.532 us/op
Iteration 5: 698.493 us/op
Iteration 6: 800.738 us/op
Iteration 7: 929.791 us/op
Iteration 8: 870.946 us/op
Iteration 9: 863.416 us/op
Iteration 10: 916.508 us/op
...so results are a bit off because of that:
Benchmark (length) Mode
Cnt Score Error Units
RecordDeserializationBench.deserializeClasses 10 avgt
10 8.263 ± 0.043 us/op
RecordDeserializationBench.deserializeClasses 100 avgt 10
33.406 ± 0.160 us/op
RecordDeserializationBench.deserializeClasses 1000 avgt 10
287.595 ± 0.960 us/op
RecordDeserializationBench.deserializeRecords 10 avgt 10
15.270 ± 0.080 us/op
RecordDeserializationBench.deserializeRecords 100 avgt 10
1127.163 ± 1771.892 us/op
RecordDeserializationBench.deserializeRecords 1000 avgt 10
2003.235 ± 227.159 us/op
Yes, there is quite a bit of GCing going on when cache is thrashing.
Note that I haven't tuned GC in any way and I'm running this on a
machine with 64GiB of RAM so heap is allowed to grow quite big and
G1 is used by default I think.
This is still no worse than before the patch:
Benchmark (length) Mode Cnt
Score Error Units
RecordDeserialization.deserializeClasses 10 avgt 10
8.382 : 0.013 us/op
RecordDeserialization.deserializeClasses 100 avgt 10
33.736 : 0.171 us/op
RecordDeserialization.deserializeClasses 1000 avgt 10
271.224 : 0.953 us/op
RecordDeserialization.deserializeRecords 10 avgt 10
58.606 : 0.446 us/op
RecordDeserialization.deserializeRecords 100 avgt 10
530.044 : 1.752 us/op
RecordDeserialization.deserializeRecords 1000 avgt 10
5335.624 : 44.942 us/op
... since caching of adapted method handle for multiple objects
withing single stream is still effective.
I doubt there will ever be more than 10 variants/versions of the
same record class deserialized by the same VM and in rapid
succession, so I think this should not be an issue. We could add a
system property to control the MAX_SIZE of cache if you think it is
needed.
Thanks for running the numbers on this, and I agree - it seems
outlandishly improbable (most will only see one, and if you have to
maintain 10+ different serialized shapes of some record you likely
have bigger problems).
I'd say let's keep it constant unless someone actually asks for it.
/Claes
Regards, Peter
On 6/22/20 1:04 AM, Claes Redestad wrote:
Hi Peter,
patch and results look great!
My only real comment on this is that I think the microbenchmark
would be
a valuable contribution, too.
It'd also be interesting to explore how poor performance would
become if
we'd hit the (artificial) 11 layouts limit, e.g, by cycling through
10, 11, or 12 different shapes.
/Claes
On 2020-06-21 19:16, Peter Levart wrote:
Hi,
When re-running the benchmark [1] with different lengths of
serialized arrays of records, I found that, compared to classical
classes, lookup into the cache of adapted method handles starts to
show when the length of array is larger (# of instances of same
record type deserialized in single stream). Each record
deserialized must lookup the method handle in a ConcurrentHashMap:
Benchmark (length) Mode Cnt
Score Error Units
RecordSerializationBench.deserializeClasses 10 avgt 10
8.088 ± 0.013 us/op
RecordSerializationBench.deserializeClasses 100 avgt 10
32.171 ± 0.324 us/op
RecordSerializationBench.deserializeClasses 1000 avgt 10
279.762 ± 3.072 us/op
RecordSerializationBench.deserializeRecords 10 avgt 10
9.011 ± 0.027 us/op
RecordSerializationBench.deserializeRecords 100 avgt 10
33.206 ± 0.514 us/op
RecordSerializationBench.deserializeRecords 1000 avgt 10
325.137 ± 0.969 us/op
...so keeping the correctly shaped adapted method handle in the
per-serialization-session ObjectStreamClass instance [2] starts to
make sense:
Benchmark (length) Mode Cnt
Score Error Units
RecordSerializationBench.deserializeClasses 10 avgt 10
8.681 ± 0.155 us/op
RecordSerializationBench.deserializeClasses 100 avgt 10
32.496 ± 0.087 us/op
RecordSerializationBench.deserializeClasses 1000 avgt 10
279.014 ± 1.189 us/op
RecordSerializationBench.deserializeRecords 10 avgt 10
8.537 ± 0.032 us/op
RecordSerializationBench.deserializeRecords 100 avgt 10
31.451 ± 0.083 us/op
RecordSerializationBench.deserializeRecords 1000 avgt 10
250.854 ± 2.772 us/op
With that, more objects means advantage over classical classes
instead of disadvantage.
[1]
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java
[2]
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/
Regards, Peter