Re: RFR: 8247532, 8248135: Records deserialization is slow + Build microbenchmarks with --enable-preview

Peter Levart Tue, 23 Jun 2020 02:23:23 -0700

Including build-dev since this patch is adding new issue 8248135:



https://bugs.openjdk.java.net/browse/JDK-8248135

So here's new webrev with a patch for building benchmarks with--enable-preview included:



http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.08/


Regards, Peter


On 6/23/20 10:23 AM, Claes Redestad wrote:

On 2020-06-23 10:06, Claes Redestad wrote:
Hi,

On 2020-06-23 09:49, Peter Levart wrote:
Hi Chris, Claes,


Ok then, here's with benchmark included:
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/
I haven't been able to run the benchmark with "make test" though. Ihave tried various ways to pass javac options to build like:
make testTEST='micro:org.openjdk.bench.java.io.RecordDeserialization'TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"
...but javac doesn't seem to get them. Is there some secret optionto achieve that?
Hmm, we might as well have the microbenchmarks build with
--enable-preview on by default. Try this:
Fixed:

diff -r f2e1cd498381 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:08:35 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:33:17 2020 +0200
@@ -90,10 +90,11 @@
     TARGET_RELEASE := $(TARGET_RELEASE_NEWJDK_UPGRADED), \
     SMALL_JAVA := false, \
     CLASSPATH := $(MICROBENCHMARK_CLASSPATH), \
-    DISABLED_WARNINGS := processing rawtypes cast serial, \
+    DISABLED_WARNINGS := processing rawtypes cast serial preview, \
     SRC := $(MICROBENCHMARK_SRC), \
     BIN := $(MICROBENCHMARK_CLASSES), \
JAVA_FLAGS := --add-modules jdk.unsupported --limit-modulesjava.management, \
+    JAVAC_FLAGS := --enable-preview, \
 ))

 $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)

I verified this works with your micro, and doesn't seem to cause
any issues elsewhere:

 make test TEST=micro:RecordDeserialization

I can shepherd this as a separate fix for documentation purposes, but
feel free to include it in your patch and ping build-dev@..

/Claes
diff -r 52741f85bf23 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:54:42 2020+0200+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:59:29 2020+0200
@@ -93,7 +93,7 @@
      DISABLED_WARNINGS := processing rawtypes cast serial, \
      SRC := $(MICROBENCHMARK_SRC), \
      BIN := $(MICROBENCHMARK_CLASSES), \
- JAVA_FLAGS := --add-modules jdk.unsupported --limit-modulesjava.management, \+ JAVA_FLAGS := --enable-preview --add-modules jdk.unsupported--limit-modules java.management, \
  ))

  $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)
Otherwise, I simulated what would happen when there are more then 10ObjectStreamClass layouts for same class rapidly interchanging, sothat they push each other out of the cache, by temporarily settingcache's MAX_SIZE = 0. Note that this is worst case scenario:
Benchmark (length) Mode Cnt Score Error UnitsRecordDeserializationBench.deserializeClasses 10 avgt 10 9.393 ± 0.287 us/opRecordDeserializationBench.deserializeClasses 100 avgt 10 35.642 ± 0.977 us/opRecordDeserializationBench.deserializeClasses 1000 avgt 10 293.769 ± 7.321 us/opRecordDeserializationBench.deserializeRecords 10 avgt 10 15.335 ± 0.496 us/opRecordDeserializationBench.deserializeRecords 100 avgt 10 211.427 ± 11.908 us/opRecordDeserializationBench.deserializeRecords 1000 avgt 10 990.398 ± 26.681 us/op
This is using JMH option '-gc true' to force GC after each iterationof benchmark. Without it, I get a big (~4s) full-GC pause just inthe middle of a run with length=100:
Iteration   1: 528.577 us/op
Iteration   2: 580.404 us/op
Iteration   3: 4438.228 us/op
Iteration   4: 644.532 us/op
Iteration   5: 698.493 us/op
Iteration   6: 800.738 us/op
Iteration   7: 929.791 us/op
Iteration   8: 870.946 us/op
Iteration   9: 863.416 us/op
Iteration  10: 916.508 us/op


...so results are a bit off because of that:
Benchmark (length) ModeCnt Score Error UnitsRecordDeserializationBench.deserializeClasses 10 avgt 10 8.263 ± 0.043 us/opRecordDeserializationBench.deserializeClasses 100 avgt 10 33.406 ± 0.160 us/opRecordDeserializationBench.deserializeClasses 1000 avgt 10 287.595 ± 0.960 us/opRecordDeserializationBench.deserializeRecords 10 avgt 10 15.270 ± 0.080 us/opRecordDeserializationBench.deserializeRecords 100 avgt 10 1127.163 ± 1771.892 us/opRecordDeserializationBench.deserializeRecords 1000 avgt 10 2003.235 ± 227.159 us/op
Yes, there is quite a bit of GCing going on when cache is thrashing.Note that I haven't tuned GC in any way and I'm running this on amachine with 64GiB of RAM so heap is allowed to grow quite big andG1 is used by default I think.
This is still no worse than before the patch:
Benchmark (length) Mode Cnt Score Error UnitsRecordDeserialization.deserializeClasses 10 avgt 10 8.382 : 0.013 us/opRecordDeserialization.deserializeClasses 100 avgt 10 33.736 : 0.171 us/opRecordDeserialization.deserializeClasses 1000 avgt 10 271.224 : 0.953 us/opRecordDeserialization.deserializeRecords 10 avgt 10 58.606 : 0.446 us/opRecordDeserialization.deserializeRecords 100 avgt 10 530.044 : 1.752 us/opRecordDeserialization.deserializeRecords 1000 avgt 105335.624 : 44.942 us/op
... since caching of adapted method handle for multiple objectswithing single stream is still effective.
I doubt there will ever be more than 10 variants/versions of thesame record class deserialized by the same VM and in rapidsuccession, so I think this should not be an issue. We could add asystem property to control the MAX_SIZE of cache if you think it isneeded.
Thanks for running the numbers on this, and I agree - it seems
outlandishly improbable (most will only see one, and if you have to
maintain 10+ different serialized shapes of some record you likelyhave bigger problems).
I'd say let's keep it constant unless someone actually asks for it.

/Claes
Regards, Peter


On 6/22/20 1:04 AM, Claes Redestad wrote:
Hi Peter,

patch and results look great!
My only real comment on this is that I think the microbenchmarkwould be
a valuable contribution, too.
It'd also be interesting to explore how poor performance wouldbecome if
we'd hit the (artificial) 11 layouts limit, e.g, by cycling through
10, 11, or 12 different shapes.

/Claes

On 2020-06-21 19:16, Peter Levart wrote:
Hi,
When re-running the benchmark [1] with different lengths ofserialized arrays of records, I found that, compared to classicalclasses, lookup into the cache of adapted method handles starts toshow when the length of array is larger (# of instances of samerecord type deserialized in single stream). Each recorddeserialized must lookup the method handle in a ConcurrentHashMap:
Benchmark (length) Mode CntScore Error UnitsRecordSerializationBench.deserializeClasses 10 avgt 108.088 ± 0.013 us/opRecordSerializationBench.deserializeClasses 100 avgt 1032.171 ± 0.324 us/opRecordSerializationBench.deserializeClasses 1000 avgt 10279.762 ± 3.072 us/opRecordSerializationBench.deserializeRecords 10 avgt 109.011 ± 0.027 us/opRecordSerializationBench.deserializeRecords 100 avgt 1033.206 ± 0.514 us/opRecordSerializationBench.deserializeRecords 1000 avgt 10325.137 ± 0.969 us/op
...so keeping the correctly shaped adapted method handle in theper-serialization-session ObjectStreamClass instance [2] starts tomake sense:
Benchmark (length) Mode CntScore Error UnitsRecordSerializationBench.deserializeClasses 10 avgt 108.681 ± 0.155 us/opRecordSerializationBench.deserializeClasses 100 avgt 1032.496 ± 0.087 us/opRecordSerializationBench.deserializeClasses 1000 avgt 10279.014 ± 1.189 us/opRecordSerializationBench.deserializeRecords 10 avgt 108.537 ± 0.032 us/opRecordSerializationBench.deserializeRecords 100 avgt 1031.451 ± 0.083 us/opRecordSerializationBench.deserializeRecords 1000 avgt 10250.854 ± 2.772 us/op
With that, more objects means advantage over classical classesinstead of disadvantage.
[1]http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java
[2]http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/
Regards, Peter

Re: RFR: 8247532, 8248135: Records deserialization is slow + Build microbenchmarks with --enable-preview

Reply via email to