On Tue, 30 Apr 2024 18:54:09 GMT, Alex Menkov <amen...@openjdk.org> wrote:
>> The fix makes VM heap dumping parallel by default. >> `jcmd GC.heap_dump` and `jmap -dump` had parallel dumping by default, the >> fix affects `HotSpotDiagnosticMXBean.dumpHeap()`, >> `-XX:+HeapDumpBeforeFullGC`, `-XX:+HeapDumpAfterFullGC` and >> `-XX:+HeapDumpOnOutOfMemoryError`. >> >> Testing: >> - manually tested different heap dump scenarios with `-Xlog:heapdump`; >> - tier1,tier2,hs-tier5-svc; >> - all reg.tests that use heap dump. > > Alex Menkov has updated the pull request incrementally with one additional > commit since the last revision: > > removed space-only change in diagnosticCommand.cpp Late to the party here, sorry. Are there motivational performance improvements that we get from enabling parallel heap dumps by default? Asking because [JDK-8319650](https://bugs.openjdk.org/browse/JDK-8319650) and [JDK-8320924](https://bugs.openjdk.org/browse/JDK-8320924) improved single-threaded heap dump performance drastically, and the I/O looked to be a bottleneck going forward. Would parallel heap dump take _more_ I/O for writing chunks first, and then combining them into large file? Do we know that parallel heap dump still wins a lot? I don't think it does all that much. Here is a simple run using the workload from [JDK-8319650](https://bugs.openjdk.org/browse/JDK-8319650) on 10-core machine: $ for I in `seq 1 5`; do java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done # jdk mainline with this fix (parallel) Heap dump file created [1897668110 bytes in 1.401 secs] Heap dump file created [1897667841 bytes in 1.354 secs] Heap dump file created [1897668050 bytes in 1.440 secs] Heap dump file created [1897668101 bytes in 1.366 secs] Heap dump file created [1897668101 bytes in 1.345 secs] # jdk mainline without this fix (sequential) Heap dump file created [1897668092 bytes in 2.314 secs] Heap dump file created [1897668092 bytes in 2.384 secs] Heap dump file created [1897668092 bytes in 2.269 secs] Heap dump file created [1897668092 bytes in 2.274 secs] Heap dump file created [1897667816 bytes in 2.282 secs] This is less than 2x improvement, even though we took 3 threads to heap dump: [1.645s][info][heapdump] Requested dump threads 3, active dump threads 9, actual dump threads 3, parallelism true [1.649s][info][heapdump] Dump non-objects, 0.0046221 secs [1.651s][info][heapdump] Dump non-objects (part 2), 0.0019995 secs [2.230s][info][heapdump] Dump heap objects in parallel, 0.5850964 secs [2.230s][info][heapdump] Dump heap objects in parallel, 0.5852571 secs [2.230s][info][heapdump] Dump heap objects in parallel, 0.5790543 secs [2.558s][info][heapdump] Merge segmented heap file, 0.3268282 secs [2.863s][info][heapdump] Merge segmented heap file, 0.3047630 secs [3.307s][info][heapdump] Merge segmented heap file, 0.4436261 secs [3.308s][info][heapdump] Merge heap files complete, 1.0766620 secs Heap dump file created [1897667959 bytes in 1.664 secs] And this is on a fast SSD, where I/O is abundant, and there is plenty of space. The sequential heap dump also seems to be regressing against jdk21u-dev, which does: Heap dump file created [1897840374 bytes in 1.071 secs] Heap dump file created [1897840481 bytes in 1.070 secs] Heap dump file created [1897840490 bytes in 1.069 secs] Heap dump file created [1897840481 bytes in 1.073 secs] Heap dump file created [1897840481 bytes in 1.134 secs] I believe that is because the 2-phase heap dump makes excess work for a single-threaded heap dump. Note that the _parallel_ heap dump in current mainline is not even able to catch up with *what we already had* with _sequential_ heap dump. I propose we revert this switch to parallel, fix the sequential heap dump performance, and then reconsider -- with benchmarks -- if we want to switch to parallel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2089957067