On Tue, 30 Apr 2024 18:54:09 GMT, Alex Menkov <amen...@openjdk.org> wrote:

>> The fix makes VM heap dumping parallel by default.
>> `jcmd GC.heap_dump` and `jmap -dump` had parallel dumping by default, the 
>> fix affects `HotSpotDiagnosticMXBean.dumpHeap()`, 
>> `-XX:+HeapDumpBeforeFullGC`, `-XX:+HeapDumpAfterFullGC` and 
>> `-XX:+HeapDumpOnOutOfMemoryError`.
>> 
>> Testing:
>>   - manually tested different heap dump scenarios with `-Xlog:heapdump`;
>>   - tier1,tier2,hs-tier5-svc;
>>   - all reg.tests that use heap dump.
>
> Alex Menkov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   removed space-only change in diagnosticCommand.cpp

Late to the party here, sorry. 

Are there motivational performance improvements that we get from enabling 
parallel heap dumps by default? Asking because 
[JDK-8319650](https://bugs.openjdk.org/browse/JDK-8319650) and 
[JDK-8320924](https://bugs.openjdk.org/browse/JDK-8320924) improved 
single-threaded heap dump performance drastically, and the I/O looked to be a 
bottleneck going forward. 

Would parallel heap dump take _more_ I/O for writing chunks first, and then 
combining them into large file? Do we know that parallel heap dump still wins a 
lot? I don't think it does all that much. Here is a simple run using the 
workload from [JDK-8319650](https://bugs.openjdk.org/browse/JDK-8319650) on 
10-core machine:


$ for I in `seq 1 5`; do java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC 
-Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done

# jdk mainline with this fix (parallel)
Heap dump file created [1897668110 bytes in 1.401 secs]
Heap dump file created [1897667841 bytes in 1.354 secs]
Heap dump file created [1897668050 bytes in 1.440 secs]
Heap dump file created [1897668101 bytes in 1.366 secs]
Heap dump file created [1897668101 bytes in 1.345 secs]

# jdk mainline without this fix (sequential)
Heap dump file created [1897668092 bytes in 2.314 secs]
Heap dump file created [1897668092 bytes in 2.384 secs]
Heap dump file created [1897668092 bytes in 2.269 secs]
Heap dump file created [1897668092 bytes in 2.274 secs]
Heap dump file created [1897667816 bytes in 2.282 secs]


This is less than 2x improvement, even though we took 3 threads to heap dump:


[1.645s][info][heapdump] Requested dump threads 3, active dump threads 9, 
actual dump threads 3, parallelism true
[1.649s][info][heapdump] Dump non-objects, 0.0046221 secs
[1.651s][info][heapdump] Dump non-objects (part 2), 0.0019995 secs
[2.230s][info][heapdump] Dump heap objects in parallel, 0.5850964 secs
[2.230s][info][heapdump] Dump heap objects in parallel, 0.5852571 secs
[2.230s][info][heapdump] Dump heap objects in parallel, 0.5790543 secs
[2.558s][info][heapdump] Merge segmented heap file, 0.3268282 secs
[2.863s][info][heapdump] Merge segmented heap file, 0.3047630 secs
[3.307s][info][heapdump] Merge segmented heap file, 0.4436261 secs
[3.308s][info][heapdump] Merge heap files complete, 1.0766620 secs
Heap dump file created [1897667959 bytes in 1.664 secs]


And this is on a fast SSD, where I/O is abundant, and there is plenty of space.
The sequential heap dump also seems to be regressing against jdk21u-dev, which 
does:


Heap dump file created [1897840374 bytes in 1.071 secs]
Heap dump file created [1897840481 bytes in 1.070 secs]
Heap dump file created [1897840490 bytes in 1.069 secs]
Heap dump file created [1897840481 bytes in 1.073 secs]
Heap dump file created [1897840481 bytes in 1.134 secs]


I believe that is because the 2-phase heap dump makes excess work for a 
single-threaded heap dump. Note that the _parallel_ heap dump in current 
mainline is not even able to catch up with *what we already had* with 
_sequential_ heap dump.

I propose we revert this switch to parallel, fix the sequential heap dump 
performance, and then reconsider -- with benchmarks -- if we want to switch to 
parallel.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2089957067

Reply via email to