On 2020-06-23 10:06, Claes Redestad wrote:
Hi,
On 2020-06-23 09:49, Peter Levart wrote:
Hi Chris, Claes,
Ok then, here's with benchmark included:
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/
I haven't been able to run the benchmark with "make test" though.
I have tried various ways to pass javac options to build like:
make test
TEST='micro:org.openjdk.bench.java.io.RecordDeserialization'
TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"
...but javac doesn't seem to get them. Is there some secret option
to achieve that?
Hmm, we might as well have the microbenchmarks build with
--enable-preview on by default. Try this:
Fixed:
diff -r f2e1cd498381 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:08:35 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:33:17 2020 +0200
@@ -90,10 +90,11 @@
TARGET_RELEASE := $(TARGET_RELEASE_NEWJDK_UPGRADED), \
SMALL_JAVA := false, \
CLASSPATH := $(MICROBENCHMARK_CLASSPATH), \
- DISABLED_WARNINGS := processing rawtypes cast serial, \
+ DISABLED_WARNINGS := processing rawtypes cast serial preview, \
SRC := $(MICROBENCHMARK_SRC), \
BIN := $(MICROBENCHMARK_CLASSES), \
JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules
java.management, \
+ JAVAC_FLAGS := --enable-preview, \
))
$(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)
I verified this works with your micro, and doesn't seem to cause
any issues elsewhere:
make test TEST=micro:RecordDeserialization
I can shepherd this as a separate fix for documentation purposes, but
feel free to include it in your patch and ping build-dev@..
/Claes
diff -r 52741f85bf23 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:54:42 2020
+0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:59:29 2020
+0200
@@ -93,7 +93,7 @@
DISABLED_WARNINGS := processing rawtypes cast serial, \
SRC := $(MICROBENCHMARK_SRC), \
BIN := $(MICROBENCHMARK_CLASSES), \
- JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules
java.management, \
+ JAVA_FLAGS := --enable-preview --add-modules jdk.unsupported
--limit-modules java.management, \
))
$(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)
Otherwise, I simulated what would happen when there are more then
10 ObjectStreamClass layouts for same class rapidly interchanging,
so that they push each other out of the cache, by temporarily
setting cache's MAX_SIZE = 0. Note that this is worst case scenario:
Benchmark (length) Mode
Cnt Score Error Units
RecordDeserializationBench.deserializeClasses 10 avgt
10 9.393 ± 0.287 us/op
RecordDeserializationBench.deserializeClasses 100 avgt
10 35.642 ± 0.977 us/op
RecordDeserializationBench.deserializeClasses 1000 avgt 10
293.769 ± 7.321 us/op
RecordDeserializationBench.deserializeRecords 10 avgt
10 15.335 ± 0.496 us/op
RecordDeserializationBench.deserializeRecords 100 avgt 10
211.427 ± 11.908 us/op
RecordDeserializationBench.deserializeRecords 1000 avgt 10
990.398 ± 26.681 us/op
This is using JMH option '-gc true' to force GC after each
iteration of benchmark. Without it, I get a big (~4s) full-GC
pause just in the middle of a run with length=100:
Iteration 1: 528.577 us/op
Iteration 2: 580.404 us/op
Iteration 3: 4438.228 us/op
Iteration 4: 644.532 us/op
Iteration 5: 698.493 us/op
Iteration 6: 800.738 us/op
Iteration 7: 929.791 us/op
Iteration 8: 870.946 us/op
Iteration 9: 863.416 us/op
Iteration 10: 916.508 us/op
...so results are a bit off because of that:
Benchmark (length) Mode
Cnt Score Error Units
RecordDeserializationBench.deserializeClasses 10 avgt
10 8.263 ± 0.043 us/op
RecordDeserializationBench.deserializeClasses 100 avgt
10 33.406 ± 0.160 us/op
RecordDeserializationBench.deserializeClasses 1000 avgt
10 287.595 ± 0.960 us/op
RecordDeserializationBench.deserializeRecords 10 avgt
10 15.270 ± 0.080 us/op
RecordDeserializationBench.deserializeRecords 100 avgt 10
1127.163 ± 1771.892 us/op
RecordDeserializationBench.deserializeRecords 1000 avgt 10
2003.235 ± 227.159 us/op
Yes, there is quite a bit of GCing going on when cache is
thrashing. Note that I haven't tuned GC in any way and I'm running
this on a machine with 64GiB of RAM so heap is allowed to grow
quite big and G1 is used by default I think.
This is still no worse than before the patch:
Benchmark (length) Mode Cnt
Score Error Units
RecordDeserialization.deserializeClasses 10 avgt 10
8.382 : 0.013 us/op
RecordDeserialization.deserializeClasses 100 avgt 10
33.736 : 0.171 us/op
RecordDeserialization.deserializeClasses 1000 avgt 10
271.224 : 0.953 us/op
RecordDeserialization.deserializeRecords 10 avgt 10
58.606 : 0.446 us/op
RecordDeserialization.deserializeRecords 100 avgt 10
530.044 : 1.752 us/op
RecordDeserialization.deserializeRecords 1000 avgt 10
5335.624 : 44.942 us/op
... since caching of adapted method handle for multiple objects
withing single stream is still effective.
I doubt there will ever be more than 10 variants/versions of the
same record class deserialized by the same VM and in rapid
succession, so I think this should not be an issue. We could add a
system property to control the MAX_SIZE of cache if you think it
is needed.
Thanks for running the numbers on this, and I agree - it seems
outlandishly improbable (most will only see one, and if you have to
maintain 10+ different serialized shapes of some record you likely
have bigger problems).
I'd say let's keep it constant unless someone actually asks for it.
/Claes
Regards, Peter
On 6/22/20 1:04 AM, Claes Redestad wrote:
Hi Peter,
patch and results look great!
My only real comment on this is that I think the microbenchmark
would be
a valuable contribution, too.
It'd also be interesting to explore how poor performance would
become if
we'd hit the (artificial) 11 layouts limit, e.g, by cycling through
10, 11, or 12 different shapes.
/Claes
On 2020-06-21 19:16, Peter Levart wrote:
Hi,
When re-running the benchmark [1] with different lengths of
serialized arrays of records, I found that, compared to
classical classes, lookup into the cache of adapted method
handles starts to show when the length of array is larger (# of
instances of same record type deserialized in single stream).
Each record deserialized must lookup the method handle in a
ConcurrentHashMap:
Benchmark (length) Mode Cnt
Score Error Units
RecordSerializationBench.deserializeClasses 10 avgt 10
8.088 ± 0.013 us/op
RecordSerializationBench.deserializeClasses 100 avgt 10
32.171 ± 0.324 us/op
RecordSerializationBench.deserializeClasses 1000 avgt 10
279.762 ± 3.072 us/op
RecordSerializationBench.deserializeRecords 10 avgt 10
9.011 ± 0.027 us/op
RecordSerializationBench.deserializeRecords 100 avgt 10
33.206 ± 0.514 us/op
RecordSerializationBench.deserializeRecords 1000 avgt 10
325.137 ± 0.969 us/op
...so keeping the correctly shaped adapted method handle in the
per-serialization-session ObjectStreamClass instance [2] starts
to make sense:
Benchmark (length) Mode Cnt
Score Error Units
RecordSerializationBench.deserializeClasses 10 avgt 10
8.681 ± 0.155 us/op
RecordSerializationBench.deserializeClasses 100 avgt 10
32.496 ± 0.087 us/op
RecordSerializationBench.deserializeClasses 1000 avgt 10
279.014 ± 1.189 us/op
RecordSerializationBench.deserializeRecords 10 avgt 10
8.537 ± 0.032 us/op
RecordSerializationBench.deserializeRecords 100 avgt 10
31.451 ± 0.083 us/op
RecordSerializationBench.deserializeRecords 1000 avgt 10
250.854 ± 2.772 us/op
With that, more objects means advantage over classical classes
instead of disadvantage.
[1]
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java
[2]
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/
Regards, Peter