Re: RFR: 8320215: HeapDumper can use DumpWriter buffer during merge

2024-05-29 Thread Yi Yang
On Fri, 19 Apr 2024 00:10:12 GMT, Alex Menkov  wrote:

> The fix updates HeapMerger to use writer buffer (no need to copy memory, also 
> writer buffer is 1MB instead of 4KB).
> Additionally fixed small issue in FileWriter (looks like `ssize_t` instead of 
> `size_t` is a typo, the argument should be unsigned)
> 
> Testing: all HeapDump-related tests on Oracle supported platforms

Thanks for the explanation, looks good then~

-

Marked as reviewed by yyang (Committer).

PR Review: https://git.openjdk.org/jdk/pull/18850#pullrequestreview-2086936328


Re: RFR: 8320215: HeapDumper can use DumpWriter buffer during merge

2024-05-28 Thread Yi Yang
On Fri, 19 Apr 2024 00:10:12 GMT, Alex Menkov  wrote:

> The fix updates HeapMerger to use writer buffer (no need to copy memory, also 
> writer buffer is 1MB instead of 4KB).
> Additionally fixed small issue in FileWriter (looks like `ssize_t` instead of 
> `size_t` is a typo, the argument should be unsigned)
> 
> Testing: all HeapDump-related tests on Oracle supported platforms

I remember experimenting with different buffer sizes and figuring out that 4KB 
was the sweet spot. We could potentially switch to 1MB, but it would be better 
if we had some benchmark numbers to back that up.

-

PR Comment: https://git.openjdk.org/jdk/pull/18850#issuecomment-2136422628


Re: RFR: 8330066: HeapDumpPath and HeapDumpGzipLevel VM options do not mention HeapDumpBeforeFullGC and HeapDumpAfterFullGC [v2]

2024-05-14 Thread Yi Yang
On Tue, 14 May 2024 01:53:20 GMT, Alex Menkov  wrote:

>> The fix updates descriptions of `HeapDumpPath`/`HeapDumpGzipLevel` and 
>> `HeapDumpBeforeFullGC`/`HeapDumpAfterFullGC`/`HeapDumpOnOutOfMemoryError` VM 
>> options
>
> Alex Menkov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   align

Marked as reviewed by yyang (Committer).

-

PR Review: https://git.openjdk.org/jdk/pull/19224#pullrequestreview-2054876579


Re: RFR: 8322043: HeapDumper should use parallel dump by default [v3]

2024-04-15 Thread Yi Yang
On Mon, 15 Apr 2024 23:18:54 GMT, Alex Menkov  wrote:

>> The fix makes VM heap dumping parallel by default.
>> `jcmd GC.heap_dump` and `jmap -dump` had parallel dumping by default, the 
>> fix affects `HotSpotDiagnosticMXBean.dumpHeap()`, 
>> `-XX:+HeapDumpBeforeFullGC`, `-XX:+HeapDumpAfterFullGC` and 
>> `-XX:+HeapDumpOnOutOfMemoryError`.
>> 
>> Testing:
>>   - manually tested different heap dump scenarios with `-Xlog:heapdump`;
>>   - tier1,tier2,hs-tier5-svc;
>>   - all reg.tests that use heap dump.
>
> Alex Menkov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   check free_memory for OOME

Looks good. Thanks for doing this!

-

Marked as reviewed by yyang (Committer).

PR Review: https://git.openjdk.org/jdk/pull/18748#pullrequestreview-2002450373


Re: RFR: JDK-8322042: HeapDumper should perform merge on the current thread instead of VMThread

2024-04-01 Thread Yi Yang
On Tue, 2 Apr 2024 00:40:37 GMT, Alex Menkov  wrote:

> The fix updated HeapDumper to always perform merge on the current thread.
> 
> Testing: tier1-5, all HeapDump-related tests
>   Covered heap dumping scenarios:
> - `jcmd GC.heap_dump` command;
> - `HotSpotDiagnosticMXBean.dumpHeap()`;
> - `HeapDumpBeforeFullGC`, `HeapDumpAfterFullGC` VM options;
> - `HeapDumpOnOutOfMemoryError` VM option.

- jcmd GC.heap_dump command; `AttachListenerThread`
- HotSpotDiagnosticMXBean.dumpHeap(); `JavaThread`
- HeapDumpBeforeFullGC, HeapDumpAfterFullGC VM options; `VMThread`
- HeapDumpOnOutOfMemoryError VM option. `VMThread`
Mabye we can always use AttachListenerThread(via Handshake) or new virtual 
thread?

-

PR Review: https://git.openjdk.org/jdk/pull/18571#pullrequestreview-1972484174


Withdrawn: 8327864: Support segmented heap dump for HotSpotDiagnosticMXBean

2024-03-25 Thread Yi Yang
On Tue, 12 Mar 2024 07:59:12 GMT, Yi Yang  wrote:

> We've received feedback from users of cloud APM platform wanting the new 
> version of the JDK to allow the HotSpotDiagnosticMXBean.dumpHeap 
> underpinnings to reduce STW time using sgemented heapdump. Supporting 
> segmented heapdump for mxbean is a reasonable requirement and adds little 
> additional complexity, both in terms of maintainability and understanding.

This pull request has been closed without being integrated.

-

PR: https://git.openjdk.org/jdk/pull/18221


Re: RFR: JDK-8327769: jcmd GC.heap_dump without options should write to location given by -XX:HeapDumpPath, if set [v2]

2024-03-15 Thread Yi Yang
On Thu, 14 Mar 2024 08:49:45 GMT, Matthias Baesken  wrote:

> > Looks reasonable, this is a harmless change, but the name 
> > `dump_to_heapdump_path` looks too details(and somewhat strange) to me
> 
> Do you maybe have a good suggestion for a new name ?

Maybe `dump` and `dump_to`

-

PR Comment: https://git.openjdk.org/jdk/pull/18190#issuecomment-1999120772


Re: RFR: JDK-8327769: jcmd GC.heap_dump without options should write to location given by -XX:HeapDumpPath, if set [v2]

2024-03-13 Thread Yi Yang
On Tue, 12 Mar 2024 12:22:41 GMT, Matthias Baesken  wrote:

>> Currently jcmd command GC.heap_dump only works with an additionally provided 
>> file name.
>> Syntax : GC.heap_dump [options] 
>> 
>> In case the JVM has the XX - flag HeapDumpPath set, we should support an 
>> additional mode where the  is optional.
>> In case the filename is NOT set, we take the HeapDumpPath (file or 
>> directory);
>> 
>> new syntax :
>> GC.heap_dump [options]  .. has precedence over second option
>> GC.heap_dump [options] …in case -XX: HeapDumpPath=p is set
>> 
>> This would be a simplification e.g. for support cases where a filename or 
>> directory is set at JVM startup with -XX: HeapDumpPath=p and writing to the 
>> path is intended/recommended for usage also in the jcmd case.
>
> Matthias Baesken has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   alloc_and_create_heapdump_pathname adjust comment about freeing the 
> returned  pointer

Looks reasonable, this is a harmless change, but the name 
`dump_to_heapdump_path` looks too details(and somewhat strange) to me

-

PR Review: https://git.openjdk.org/jdk/pull/18190#pullrequestreview-1935544801


Re: RFR: 8327864: Support segmented heap dump for HotSpotDiagnosticMXBean

2024-03-13 Thread Yi Yang
On Wed, 13 Mar 2024 20:01:22 GMT, Kevin Walls  wrote:

>> We've received feedback from users of cloud APM platform wanting the new 
>> version of the JDK to allow the HotSpotDiagnosticMXBean.dumpHeap 
>> underpinnings to reduce STW time using sgemented heapdump. Supporting 
>> segmented heapdump for mxbean is a reasonable requirement and adds little 
>> additional complexity, both in terms of maintainability and understanding.
>
> Hi!
> 
> Segmented is an awkward term as we use that word inside hprof too.  For the 
> previous change we used the phrase "two phase" as well as calling it 
> segmented, which I think helps.
> 
> Also this change means jmm_DumpHeap will always use 
> HeapDumper::default_num_of_dump_threads().
> 
> That's not a bad thing, but could the bug title be something like "Enable 
> two-phase heap dump for HotSpotDiagnosticMXBean (JMX)" ?  I think then it's 
> clearer what's happening just from the title.
> 
> 
> 
> +  if (current_thread->is_AttachListener_thread() || !_oome) {
> 
> I think "!_oome" here means "we are called from JMX" ?  We should clarify. 
> 8-)  Trying to suggest a way of describing this.
> 
> // Operating in attach listener or called by management library:
> // perform heapdump file merge operation in the current thread, 
> preventing us
> // from occupying the VM Thread, which in turn affects the occurrence of
> // GC and other VM operations.
>   
> So the other case is only HeapDumpOnOutOfMemoryError processing, where we do 
> the merge in a VM operation?  Just checking if that's all agreed. 8-)

Hi @kevinjwalls Thanks for your review! I was reminded by Alex that there is 
already a [similar issue](https://bugs.openjdk.org/browse/JDK-8322043) here 
ready to make two phase heap dump the default choice, so all we need is to 
simply change the default parallel parameter (and potential bug fix)

https://github.com/openjdk/jdk/blob/21279b1f652c54693e317d55e7da6478b959fdfb/src/hotspot/share/services/heapDumper.hpp#L64

I will first synergize with @alexmenkov to see if we can improve based on the 
current patch, or close the current PR and wait for his new one.

P.S. the phrase "two phase" makes sense to me:)

-

PR Comment: https://git.openjdk.org/jdk/pull/18221#issuecomment-1996279522


RFR: 8327864: Support segmented heap dump for HotSpotDiagnosticMXBean

2024-03-12 Thread Yi Yang
We've received feedback from users of cloud APM platform wanting the new 
version of the JDK to allow the HotSpotDiagnosticMXBean.dumpHeap underpinnings 
to reduce STW time using sgemented heapdump. Supporting segmented heapdump for 
mxbean is a reasonable requirement and adds little additional complexity, both 
in terms of maintainability and understanding.

-

Commit messages:
 - 8327864: Support segmented heap dump for HotSpotDiagnosticMXBean

Changes: https://git.openjdk.org/jdk/pull/18221/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18221&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8327864
  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/18221.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18221/head:pull/18221

PR: https://git.openjdk.org/jdk/pull/18221


Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information

2024-02-05 Thread Yi Yang
On Wed, 31 Jan 2024 14:22:44 GMT, Kevin Walls  wrote:

> Introduce the jcmd "VM.debug" to implement access to a useful set of the 
> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...".
> 
> Not recommended for live production use.  Calling these "debug" utilities, 
> and not including them in the jcmd help output, is to remind us they are not 
> general customer-facing tools.

Could it be extend to something like

jcmd VM.debug MyDebugCode.java

-

PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-1926547862


Re: RFR: JDK-8321565: [REDO] Heap dump does not contain virtual Thread stack references

2023-12-11 Thread Yi Yang
On Fri, 8 Dec 2023 22:29:12 GMT, Alex Menkov  wrote:

> Original fix for JDK-8299426 (Heap dump does not contain virtual Thread stack 
> references, #16665) caused failures of new test (added while #16665 was under 
> review):
> test/hotspot/jtreg/compiler/c2/TestReduceAllocationAndHeapDump.java in many 
> tears and was reverted.
> 
> Segmented heap dump assumes "merge" stage is executed outside of safepoint 
> (to not block the VM), but heap dump may happen during safepoint (and 
> TestReduceAllocationAndHeapDump.java test provoke the case).
> The change contains original fix for JDK-8299426 ("[original 
> fix](https://github.com/openjdk/jdk/commit/bdbf768eafa86e0007aca4188e0567693afe9071)")
>  and removes asserts from HeapMerger ([allow heapdump in 
> safepoints](https://github.com/openjdk/jdk/commit/44670ca4bf55dd2a5f1f44686758844aed68937e)).
> 
> Run tier1-3 and heapdump-related tests.

Marked as reviewed by yyang (Committer).

-

PR Review: https://git.openjdk.org/jdk/pull/17040#pullrequestreview-1776478407


Re: RFR: 8299426: Heap dump does not contain virtual Thread stack references

2023-11-15 Thread Yi Yang
On Tue, 14 Nov 2023 21:54:06 GMT, Alex Menkov  wrote:

> The change impelements dumping of unmounted virtual threads data (stack 
> traces and stack references).
> Unmounted vthreads can be detected only by iterating over the heap, but hprof 
> stack trace records (HPROF_FRAME/HPROF_TRACE) should be written before 
> HPROF_HEAP_DUMP/HPROF_HEAP_DUMP_SEGMENT.
> HeapDumper supports segment dump (parallel dump to separate files with 
> subsequent file merge outside of safepoint), the fix switches HeapDumper  to 
> always use segment dump: 1st segment contains only non-heap data, other 
> segments are used for dumping heap objects. For serial dumping 
> single-threaded dumping is performed, but 2 segments are created anyway.
> When HeapObjectDumper detects unmounted virtual thread, it writes 
> HPROF_FRAME/HPROF_TRACE records to the 1st segment ("global writer"), and 
> writes thread object (HPROF_GC_ROOT_JAVA_FRAME) and stack references 
> (HPROF_GC_ROOT_JAVA_FRAME/HPROF_GC_ROOT_JNI_LOCAL) to the HeapObjectDumper 
> segment.
> As parallel dumpers may write HPROF_FRAME/HPROF_TRACE concurrently and 
> VMDumper needs to write non-heap data before heap object dumpers can write 
> virtual threads data, writing to global writer is protected with 
> DumperController::_global_writer_lock.
> 
> Testing: run tests which perform heap dump (in different scenarios):
>  - test/hotspot/jtreg/serviceability
>  - test/hotspot/jtreg/runtime/ErrorHandling
>  - test/hotspot/jtreg/gc/epsilon
>  - test/jdk/sun/tools/jhsdb

src/hotspot/share/services/heapDumper.cpp line 1896:

> 1894: 
> 1895:  public:
> 1896:   HeapObjectDumper(AbstractDumpWriter* writer, UnmountedVThreadDumper* 
> vthread_dumper)

Maybe we can pass global writer to HeapObjectDumper and thus simplify the 
implementation? We can remove `class VM_HeapDumper : public VM_GC_Operation, 
public WorkerTask, public UnmountedVThreadDumper {` and 
`UnmountedVThreadDumper` then.

src/hotspot/share/services/heapDumper.cpp line 2024:

> 2022: char* DumpMerger::get_writer_path(const char* base_path, int seq) {
> 2023:   // calculate required buffer size
> 2024: size_t buf_size = strlen(base_path)

Format

src/hotspot/share/services/heapDumper.cpp line 2411:

> 2409: 
> 2410:   WorkerThreads* workers = ch->safepoint_workers();
> 2411:   update_parallel_dump_count(workers);

Maybe consolidate them together

  update_parallel_dump_count(workers);


=>

prepare_parallel_dump(workers){
 ...  // copy from update_parallel_dump_count
 _dumper_controller = new (std::nothrow) DumperController(_num_dumper_threads);
}

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16665#discussion_r1395079535
PR Review Comment: https://git.openjdk.org/jdk/pull/16665#discussion_r1395071539
PR Review Comment: https://git.openjdk.org/jdk/pull/16665#discussion_r1395084194


Re: RFR: JDK-8319053: Segment dump files remain after parallel heap dump on Windows [v2]

2023-11-02 Thread Yi Yang
On Thu, 2 Nov 2023 20:44:14 GMT, Alex Menkov  wrote:

>> Segment file is closed from DumpWriter dtor.
>> On Unix systems `remove` can delete an opened file, on Windows it fails with 
>> "access denied".
>> The fix destroys DumpWriter objects for segment files after all data are 
>> dumped
>
> Alex Menkov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   updated log text

Looks reasonable. Thanks.

-

Marked as reviewed by yyang (Committer).

PR Review: https://git.openjdk.org/jdk/pull/16462#pullrequestreview-1711780491


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase [v2]

2023-10-10 Thread Yi Yang
On Thu, 21 Sep 2023 02:12:06 GMT, Yi Yang  wrote:

> I had not noticed this PR sorry. I do not like the fact we have Linux-only 
> functionality being added with no intent to supply similar functionality on 
> other platforms. I also do not like the fact we had to ifdef the Linux code 
> into the shared code. It would have been metter to place the code in 
> os::Linux and then have a single LINUX_ONLY() to make that call.

I made a simple attempt, but it is difficult to integrate this portion of code 
into OS Linux because it requires modifying the writer's written bytes and 
setting error messages. Logically, it belongs to DumpMerge. Abstracting it into 
a function similar to concatenate_file and placing it into OS Linux seems 
challenging.

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1756954378


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase [v2]

2023-09-20 Thread Yi Yang
On Thu, 21 Sep 2023 01:18:19 GMT, David Holmes  wrote:

> I had not noticed this PR sorry. I do not like the fact we have Linux-only 
> functionality being added with no intent to supply similar functionality on 
> other platforms. I also do not like the fact we had to ifdef the Linux code 
> into the shared code. It would have been metter to place the code in 
> os::Linux and then have a single LINUX_ONLY() to make that call.

Okay, I'll prepare a follow-up PR to do this once 
https://github.com/openjdk/jdk/pull/15843 merged.

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1728662913


Integrated: 8314021: HeapDump: Optimize segmented heap file merging phase

2023-09-18 Thread Yi Yang
On Fri, 11 Aug 2023 09:31:56 GMT, Yi Yang  wrote:

> This patch reduce ~16%(24s->20s) pahse 2 merge time during dumping 32g heap 
> with 96threads and fixes a memory leak of compressor
> 
> You might argue why this is Linux-only optimization, because sendfile 
> requires at least socket fd in other platforms([aix 
> sendfile](https://www.ibm.com/docs/en/aix/7.1?topic=s-send-file-subroutine) 
> [maxos 
> sendfile](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendfile.2.html)
>  [win32 
> TransmitFile](https://learn.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile)),
>  while [only Linux](https://man7.org/linux/man-pages/man2/sendfile.2.html) 
> supports both two file descriptors.

This pull request has now been integrated.

Changeset: 3760a044
Author:Yi Yang 
URL:   
https://git.openjdk.org/jdk/commit/3760a0448df7024f9b44fa2af11007de4dfcbbe2
Stats: 81 lines in 4 files changed: 73 ins; 5 del; 3 mod

8314021: HeapDump: Optimize segmented heap file merging phase

Reviewed-by: amenkov, kevinw

-

PR: https://git.openjdk.org/jdk/pull/15245


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase [v2]

2023-09-18 Thread Yi Yang
On Mon, 18 Sep 2023 11:09:36 GMT, Kevin Walls  wrote:

> Apologies for delay, had looked at this previously but not come back for 
> final review.

Never mind:) Thanks @alexmenkov and @kevinjwalls for the reviews.

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1724721509


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase [v2]

2023-09-17 Thread Yi Yang
On Thu, 7 Sep 2023 02:19:10 GMT, Yi Yang  wrote:

>> This patch reduce ~16%(24s->20s) pahse 2 merge time during dumping 32g heap 
>> with 96threads and fixes a memory leak of compressor
>> 
>> You might argue why this is Linux-only optimization, because sendfile 
>> requires at least socket fd in other platforms([aix 
>> sendfile](https://www.ibm.com/docs/en/aix/7.1?topic=s-send-file-subroutine) 
>> [maxos 
>> sendfile](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendfile.2.html)
>>  [win32 
>> TransmitFile](https://learn.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile)),
>>  while [only Linux](https://man7.org/linux/man-pages/man2/sendfile.2.html) 
>> supports both two file descriptors.
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   one merge_file for all

Can I have a second review? Maybe @plummercj @kevinjwalls

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1722681321


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase

2023-09-10 Thread Yi Yang
On Thu, 7 Sep 2023 00:56:40 GMT, Alex Menkov  wrote:

>> This patch reduce ~16%(24s->20s) pahse 2 merge time during dumping 32g heap 
>> with 96threads and fixes a memory leak of compressor
>> 
>> You might argue why this is Linux-only optimization, because sendfile 
>> requires at least socket fd in other platforms([aix 
>> sendfile](https://www.ibm.com/docs/en/aix/7.1?topic=s-send-file-subroutine) 
>> [maxos 
>> sendfile](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendfile.2.html)
>>  [win32 
>> TransmitFile](https://learn.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile)),
>>  while [only Linux](https://man7.org/linux/man-pages/man2/sendfile.2.html) 
>> supports both two file descriptors.
>
> The fix looks good to me in general, but I'm not sure about code organization.
> src/hotspot/share/runtime/os.hpp describes rules for os* files.
> It states:
> //   Platform-independent source files should not include these header files
> //   (although sadly there are some rare exceptions ...)
> And the change adds one more exception.
> I'd like to hear runtime guys opinion.

Hi @alexmenkov, can we move on since there are no more comments from runtime 
folks. We can still propose a follow-up change if someone have strong 
objection/concern on this "exception"

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1713063232


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase

2023-09-06 Thread Yi Yang
On Thu, 7 Sep 2023 00:56:40 GMT, Alex Menkov  wrote:

> The fix looks good to me in general, but I'm not sure about code 
> organization. src/hotspot/share/runtime/os.hpp describes rules for os* files. 
> It states: // Platform-independent source files should not include these 
> header files // (although sadly there are some rare exceptions ...) And the 
> change adds one more exception. I'd like to hear runtime guys opinion.

Thanks for the review, yes, grepping `os_linux.hpp` found several exceptions, 
I'd like to see if runtime forks have alternatives, too.

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1709371785


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase [v2]

2023-09-06 Thread Yi Yang
> This patch reduce ~16%(24s->20s) pahse 2 merge time during dumping 32g heap 
> with 96threads and fixes a memory leak of compressor
> 
> You might argue why this is Linux-only optimization, because sendfile 
> requires at least socket fd in other platforms([aix 
> sendfile](https://www.ibm.com/docs/en/aix/7.1?topic=s-send-file-subroutine) 
> [maxos 
> sendfile](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendfile.2.html)
>  [win32 
> TransmitFile](https://learn.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile)),
>  while [only Linux](https://man7.org/linux/man-pages/man2/sendfile.2.html) 
> supports both two file descriptors.

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  one merge_file for all

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15245/files
  - new: https://git.openjdk.org/jdk/pull/15245/files/ba381aaf..0eb02c14

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15245&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15245&range=00-01

  Stats: 16 lines in 1 file changed: 4 ins; 9 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/15245.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15245/head:pull/15245

PR: https://git.openjdk.org/jdk/pull/15245


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase

2023-09-05 Thread Yi Yang
On Fri, 11 Aug 2023 09:31:56 GMT, Yi Yang  wrote:

> This patch reduce ~16%(24s->20s) pahse 2 merge time during dumping 32g heap 
> with 96threads and fixes a memory leak of compressor
> 
> You might argue why this is Linux-only optimization, because sendfile 
> requires at least socket fd in other platforms([aix 
> sendfile](https://www.ibm.com/docs/en/aix/7.1?topic=s-send-file-subroutine) 
> [maxos 
> sendfile](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendfile.2.html)
>  [win32 
> TransmitFile](https://learn.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile)),
>  while [only Linux](https://man7.org/linux/man-pages/man2/sendfile.2.html) 
> supports both two file descriptors.

Can I have a review for this? This contains a memory leak and harmless 
optimziation.

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1707732875


Integrated: 8311775: [TEST] duplicate verifyHeapDump in several tests

2023-09-03 Thread Yi Yang
On Wed, 9 Aug 2023 07:02:34 GMT, Yi Yang  wrote:

> This is a follow-up patch of #13667. verifyHeapDump is duplicated in several 
> tests, this patch tries to consolidate them into one method.

This pull request has now been integrated.

Changeset: 75d4ac26
Author:    Yi Yang 
URL:   
https://git.openjdk.org/jdk/commit/75d4ac2659fb8748777458ceeea3d2e7087be40c
Stats: 167 lines in 7 files changed: 24 ins; 118 del; 25 mod

8311775: [TEST] duplicate verifyHeapDump in several tests

Reviewed-by: kevinw, amenkov, cjplummer

-

PR: https://git.openjdk.org/jdk/pull/15202


Re: RFR: 8311775: [TEST] duplicate verifyHeapDump in several tests [v5]

2023-08-31 Thread Yi Yang
> This is a follow-up patch of #13667. verifyHeapDump is duplicated in several 
> tests, this patch tries to consolidate them into one method.

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  Update HprofParser.java

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15202/files
  - new: https://git.openjdk.org/jdk/pull/15202/files/2408e194..a883e746

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/15202.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15202/head:pull/15202

PR: https://git.openjdk.org/jdk/pull/15202


Re: RFR: 8311775: [TEST] duplicate verifyHeapDump in several tests [v4]

2023-08-30 Thread Yi Yang
> This is a follow-up patch of #13667. verifyHeapDump is duplicated in several 
> tests, this patch tries to consolidate them into one method.

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  Update HeapDumpTest.java

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15202/files
  - new: https://git.openjdk.org/jdk/pull/15202/files/b6a528a2..2408e194

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/15202.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15202/head:pull/15202

PR: https://git.openjdk.org/jdk/pull/15202


Re: RFR: 8314021: HeapDump: Optimize segmented heap file merging phase

2023-08-23 Thread Yi Yang
On Fri, 11 Aug 2023 09:31:56 GMT, Yi Yang  wrote:

> This patch reduce ~16%(24s->20s) pahse 2 merge time during dumping 32g heap 
> with 96threads and fixes a memory leak of compressor
> 
> You might argue why this is Linux-only optimization, because sendfile 
> requires at least socket fd in other platforms([aix 
> sendfile](https://www.ibm.com/docs/en/aix/7.1?topic=s-send-file-subroutine) 
> [maxos 
> sendfile](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendfile.2.html)
>  [win32 
> TransmitFile](https://learn.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile)),
>  while [only Linux](https://man7.org/linux/man-pages/man2/sendfile.2.html) 
> supports both two file descriptors.

Gentle Ping :0 Can I have a review for this?

-

PR Comment: https://git.openjdk.org/jdk/pull/15245#issuecomment-1689335697


Re: RFR: 8311775: [TEST] duplicate verifyHeapDump in several tests [v2]

2023-08-10 Thread Yi Yang
On Thu, 10 Aug 2023 19:29:27 GMT, Alex Menkov  wrote:

>> Yi Yang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   fix TestHeapDumpForInvokDynamic
>
> test/hotspot/jtreg/serviceability/sa/TestHeapDumpForInvokeDynamic.java line 
> 40:
> 
>> 38: import jdk.test.lib.hprof.parser.HprofReader;
>> 39: import jdk.test.lib.hprof.parser.PositionDataInputStream;
>> 40: import jdk.test.lib.hprof.HprofParser;
> 
> PositionDataInputStream and HprofParser imports are not used anymore

Thanks for review, all copyright year and unused imports are removed now

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15202#discussion_r1290838169


Re: RFR: 8311775: [TEST] duplicate verifyHeapDump in several tests [v3]

2023-08-10 Thread Yi Yang
> This is a follow-up patch of #13667. verifyHeapDump is duplicated in several 
> tests, this patch tries to consolidate them into one method.

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  review from Alex

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15202/files
  - new: https://git.openjdk.org/jdk/pull/15202/files/ac581090..b6a528a2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=01-02

  Stats: 13 lines in 4 files changed: 0 ins; 9 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/15202.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15202/head:pull/15202

PR: https://git.openjdk.org/jdk/pull/15202


Re: RFR: 8311775: [TEST] duplicate verifyHeapDump in several tests [v2]

2023-08-10 Thread Yi Yang
On Thu, 10 Aug 2023 09:41:29 GMT, Yi Yang  wrote:

>> This is a follow-up patch of #13667. verifyHeapDump is duplicated in several 
>> tests, this patch tries to consolidate them into one method.
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   fix TestHeapDumpForInvokDynamic

Thanks  for reviews!

--
@kevinjwalls

> One question on whether we need to change those "throws IOException" to 
> "throws Exception", which is really me thinking if the existing HProfParser 
> parse method really needs to throw Exception? Maybe it only needs 
> IOException? (snapshot.resolve doesn't throw anything) But looks good anyway 
> and good to share this code. 8-)

Yes, we must change it to `throws Exception` otherwise we won't compile

/jdk/test/lib/jdk/test/lib/hprof/HprofParser.java:99: error: unreported 
exception Exception; must be caught or declared to be thrown
try (Snapshot snapshot = Reader.readFile(dump.getAbsolutePath(), 
callStack, debugLevel)) {
  ^
  exception thrown from implicit call to close() on resource variable 'snapshot'


I noticed another optional enhancement where I found that essentially all calls 
to HprofParser.parse should now be replaced with HprofParser.parseAndVerify or 
make it the standard parsing logic, but this is unrelated to my current patch.

--
@plummercj 

> Can you clarify what testing you did?

All involved tests work in Linux.

-

PR Comment: https://git.openjdk.org/jdk/pull/15202#issuecomment-1674139274


Re: RFR: 8311775: [TEST] duplicate verifyHeapDump in several tests [v2]

2023-08-10 Thread Yi Yang
> This is a follow-up patch of #13667. verifyHeapDump is duplicated in several 
> tests, this patch tries to consolidate them into one method.

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  fix TestHeapDumpForInvokDynamic

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15202/files
  - new: https://git.openjdk.org/jdk/pull/15202/files/2dc6b816..ac581090

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/15202.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15202/head:pull/15202

PR: https://git.openjdk.org/jdk/pull/15202


RFR: 8311775: [TEST] duplicate verifyHeapDump in several tests

2023-08-09 Thread Yi Yang
This is a follow-up patch of #13667. verifyHeapDump is duplicated in several 
tests, this patch tries to consolidate them into one method.

-

Commit messages:
 - 8311775: [TEST] duplicate verifyHeapDump in several tests

Changes: https://git.openjdk.org/jdk/pull/15202/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15202&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8311775
  Stats: 154 lines in 7 files changed: 25 ins; 110 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/15202.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15202/head:pull/15202

PR: https://git.openjdk.org/jdk/pull/15202


Integrated: JDK-8306441: Two phase segmented heap dump

2023-08-08 Thread Yi Yang
On Wed, 26 Apr 2023 09:37:46 GMT, Yi Yang  wrote:

> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

This pull request has now been integrated.

Changeset: 31a307f2
Author:Yi Yang 
URL:   
https://git.openjdk.org/jdk/commit/31a307f2fbe7b99435f50e5404c2a95f07b9a77b
Stats: 1512 lines in 16 files changed: 453 ins; 934 del; 125 mod

8306441: Two phase segmented heap dump

Co-authored-by: Kevin Walls 
Reviewed-by: amenkov, kevinw

-

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v28]

2023-08-08 Thread Yi Yang
On Tue, 8 Aug 2023 13:09:58 GMT, Kevin Walls  wrote:

>> Yi Yang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   aware of serialgc and epsilongc
>
> Great, thanks for your patience with all of this! 8-)

Thank you all for reviews and suggestions. And thanks @kevinjwalls for your 
patience! I'l create follow-up patchs to implement leaving features

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1670542757


Re: RFR: JDK-8306441: Two phase segmented heap dump [v28]

2023-08-08 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  aware of serialgc and epsilongc

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/fbfa9a5d..a7ede39b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=27
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=26-27

  Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v27]

2023-08-08 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  use OutputAnalyzer in HeapDumpParallelTest

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/8eea14d9..fbfa9a5d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=26
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=25-26

  Stats: 64 lines in 1 file changed: 22 ins; 23 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v26]

2023-08-08 Thread Yi Yang
On Tue, 8 Aug 2023 07:41:23 GMT, Kevin Walls  wrote:

> It's not too bad to change the test to launch a LingeredApp and test the 
> output, I made an attempt at that. I'll share a testcase suggestion here, see 
> what you think.

Thank you for the share! It makes sense to me. This test will start JVMs x4 
times, which may take a long time, but until a more elegant solution is 
available, I think we can reach a consensus at the  moment (and perhaps file 
another test improvement issue to optimize it in the future.)

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1669230611


Re: RFR: JDK-8306441: Two phase segmented heap dump [v26]

2023-08-04 Thread Yi Yang
On Fri, 4 Aug 2023 16:31:28 GMT, Kevin Walls  wrote:

>> Yi Yang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   new can_parallel_dump
>
> test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpParallelTest.java line 59:
> 
>> 57: out.shouldContain("Heap dump file created");
>> 58: if (!expectSerial && Runtime.getRuntime().availableProcessors() 
>> > 1) {
>> 59: Asserts.assertTrue(app.getProcessStdout().contains("Dump 
>> heap objects in parallel"));
> 
> I think we just need these other asserts converted to shouldContain, and the 
> assertFalse changed to shouldNotContain.
> This might be a annoying, but if you get the test failing and don't get the 
> actual output, that's worse. 8-)
> Maybe we can aim to integrate on Monday.

getProcessStdout is actually a string instead of OutputAnalyzer, its name is 
somewhat deceptive

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13667#discussion_r1284662958


Re: RFR: JDK-8306441: Two phase segmented heap dump [v26]

2023-08-04 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  new can_parallel_dump

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/9767d034..8eea14d9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=25
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=24-25

  Stats: 51 lines in 2 files changed: 26 ins; 19 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v25]

2023-08-03 Thread Yi Yang
On Thu, 3 Aug 2023 13:24:19 GMT, Kevin Walls  wrote:

> Also the variable num_requested_dump_thread is not needed? It's just a copy 
> of _num_dumper_threads which we don't change.

This is needed because _num_dumper_threads will change later, the requested 
dump thread may not equal to actual _num_dumper_threads. `is_parallel_dump()` 
checks _num_dump_thread, so we cannot use suggested code.

This involves code style favor, to avoid futile effort, do you think the below 
code is acceptable?

--- a/src/hotspot/share/services/heapDumper.cpp
+++ b/src/hotspot/share/services/heapDumper.cpp
@@ -1724,15 +1724,8 @@ class VM_HeapDumper : public VM_GC_Operation, public 
WorkerTask {
   }
   int dump_seq()   { return _dump_seq; }
   bool is_parallel_dump()  { return _num_dumper_threads > 1; }
-  bool can_parallel_dump() {
-const char* base_path = writer()->get_file_path();
-assert(base_path != nullptr, "sanity check");
-if ((strlen(base_path) + 7/*.p\d\d\d\d\0*/) >= JVM_MAXPATHLEN) {
-  // no extra path room for separate heap dump files
-  return false;
-}
-return true;
-  }
+  bool can_parallel_dump(uint num_active_workers);
+
   VMOp_Type type() const { return VMOp_HeapDumper; }
   virtual bool doit_prologue();
   void doit();
@@ -1923,6 +1916,33 @@ bool VM_HeapDumper::doit_prologue() {
   return VM_GC_Operation::doit_prologue();
 }
 
+bool VM_HeapDumper::can_parallel_dump(uint num_active_workers) {
+  assert(base_path != nullptr, "sanity check");
+  bool has_extra_path_room = true;
+  const char* base_path = writer()->get_file_path();
+  if ((strlen(base_path) + 7/*.p\d\d\d\d\0*/) >= JVM_MAXPATHLEN) {
+// no extra path room for separate heap dump files
+has_extra_path_room = false;
+  }
+
+  bool can_parallel = true;
+  uint num_requested_dump_thread = _num_dumper_threads;
+  if (num_active_workers <= 1 || // serial gc?
+  num_requested_dump_thread <= 1 ||  // request serial dump?
+ !has_extra_path_room) { // has extra path room for segmented 
dump files?
+_num_dumper_threads = 1;
+can_parallel = false;
+  } else {
+_num_dumper_threads = clamp(num_requested_dump_thread, 2U, 
num_active_workers);
+can_parallel = true;
+  }
+
+  log_info(heapdump)("Requested dump threads %u, active dump threads %u, "
+ "actual dump threads %u, parallelism %s",
+ num_requested_dump_thread, num_active_workers,
+ _num_dumper_threads, can_parallel ? "true" : "false");
+  return can_parallel;
+}
 
 // The VM operation that dumps the heap. The dump consists of the following
 // records:
@@ -1970,20 +1990,10 @@ void VM_HeapDumper::doit() {
 
   WorkerThreads* workers = ch->safepoint_workers();
   uint num_active_workers = workers != nullptr ? workers->active_workers() : 0;
-  uint num_requested_dump_thread = _num_dumper_threads;
-  bool can_parallel = can_parallel_dump();
-  log_info(heapdump)("Requested dump threads %u, active dump threads %u, " 
"parallelism %s",
-  num_requested_dump_thread, num_active_workers, 
can_parallel ? "true" : "false");
 
-  if (num_active_workers <= 1 || // serial gc?
-  num_requested_dump_thread <= 1 ||  // request serial dump?
- !can_parallel) {// can not dump in parallel?
-// Use serial dump, set dumper threads and writer threads number to 1.
-_num_dumper_threads = 1;
+  if (can_parallel_dump(num_active_workers)) {
 work(VMDumperWorkerId);
   } else {
-// Use parallel dump otherwise
-_num_dumper_threads = clamp(num_requested_dump_thread, 2U, 
num_active_workers);
 uint heap_only_dumper_threads = _num_dumper_threads - 1 /* VMDumper thread 
*/;
 _dumper_controller = new (std::nothrow) 
DumperController(heap_only_dumper_threads);
 ParallelObjectIterator poi(_num_dumper_threads);


> out.shouldContain(..then we get the output included in the log if it 
> fails, so we should see what went wrong.

Acknowledge

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1664913578


Re: RFR: JDK-8306441: Two phase segmented heap dump [v23]

2023-08-02 Thread Yi Yang
On Wed, 2 Aug 2023 13:11:51 GMT, Kevin Walls  wrote:

> > 
> 
> Yes, dividing testing by feature so we have HeapDumpTest, 
> HeapDumpCompressedTest, HeapDumpParallelTest... that sounds good.
> 
> Currently the HeapDumpTest does more parallel testing than 
> IntegrityHeapDumpTest, so I didn't understand the need, but taking all 
> parallel testing into its own test is good, and yes HeapDump test will not 
> pass a -parallel option. 8-)

Done. Please take a look at the latest version :-)

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1663189934


Re: RFR: JDK-8306441: Two phase segmented heap dump [v21]

2023-08-02 Thread Yi Yang
On Wed, 2 Aug 2023 18:02:22 GMT, Chris Plummer  wrote:

>  so I'm still wondering how this use of SATestUtils.createProcessBuilder() 
> got in this new test in the first place

I simply grep existing tests and found this snippet to create java process. 
There is no special intention, it is an obvious misuse :-)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13667#discussion_r1282564214


Re: RFR: JDK-8306441: Two phase segmented heap dump [v25]

2023-08-02 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  whitespace

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/86350e40..9767d034

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=24
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=23-24

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v24]

2023-08-02 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  HeapDumpParallelTest

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/f3f3d1c8..86350e40

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=23
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=22-23

  Stats: 301 lines in 5 files changed: 144 ins; 143 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v23]

2023-08-02 Thread Yi Yang
On Wed, 2 Aug 2023 11:33:17 GMT, Yi Yang  wrote:

>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-phase 
>> segmented heap dump:
>> 
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
>> file. This process can happen outside safepoint.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. The changes in the overall design are as follows:
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
>> Fig1. Before
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
>> Fig2. After this patch
>> 
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | ---| --- | --- | --- |  |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   fix typo Reqeusted to Request

> Do we really need IntegrityHeapDumpTest.java ? I don't see that we do.
> A problem is that it always now passes the -parallel=X option, but we should 
> still have tests that do NOT pass this option, to ensure it's not required.

The rationale for adding such a test is that we use HeapDumpCompressedTest to 
test a relatively big feature, i.e. heap dump compression, parallel heap dump 
follows this thought, maybe I should revert changes in HeapDumpTest and move 
all parallel related tests into IntegrityHeapDumpTest(Perhaps rename to 
HeapDumpParallelTest)? If we did so, HeapDumpTest covers the cases that we dont 
pass this option, which mentioned before.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1662148187


Re: RFR: JDK-8306441: Two phase segmented heap dump [v23]

2023-08-02 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  fix typo Reqeusted to Request

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/fb72f045..f3f3d1c8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=22
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=21-22

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v21]

2023-08-02 Thread Yi Yang
On Tue, 1 Aug 2023 17:48:30 GMT, Chris Plummer  wrote:

>> test/hotspot/jtreg/serviceability/HeapDump/IntegrityHeapDumpTest.java line 
>> 84:
>> 
>>> 82: .addToolArg(heapDumpFile.getAbsolutePath());
>>> 83: 
>>> 84: ProcessBuilder processBuilder = new 
>>> ProcessBuilder(launcher.getCommand());
>> 
>> Can you explain this change? This will result in the test not being run with 
>> sudo in cases where sudo is necessary. Before we get here we already checked 
>> if we need sudo (not running as root) and verified that passwordless sudo is 
>> supported. Otherwise the test is skipped and would never get here. So if 
>> sudo is required, the test will fail to attach to the debuggee. Note you 
>> might not see this issue if you have DevSecurityMode enabled. See 
>> [JDK-8313357](https://bugs.openjdk.org/browse/JDK-8313357)
>
> I see now that this is a new test, and not an SA test, so probably should 
> never have been using SATestUtils in the first place (I'm curious as to how 
> that ended up happening). You should also remove the import of SATestUtils.

> Can you explain this change? This will result in the test not being run with 
> sudo in cases where sudo is necessary. Before we get here we already checked 
> if we need sudo (not running as root) and verified that passwordless sudo is 
> supported. Otherwise the test is skipped and would never get here. So if sudo 
> is required, the test will fail to attach to the debuggee. Note you might not 
> see this issue if you have DevSecurityMode enabled. See 
> [JDK-8313357](https://bugs.openjdk.org/browse/JDK-8313357)

I followed what DuplicateArrayClassesTest.java. The reason seems that 
`SATestUtils.createProcessBuilder` does not touch any verification of whether 
passwordless sudo is supported, we need to use SATestUtils.skipIfCannotAttach 
explicitly.


public class SATestUtils {
public static ProcessBuilder createProcessBuilder(JDKToolLauncher launcher) 
{
List cmdStringList = Arrays.asList(launcher.getCommand());
if (needsPrivileges()) {
cmdStringList = addPrivileges(cmdStringList);
}
return new ProcessBuilder(cmdStringList);
}
public static boolean needsPrivileges() {
return Platform.isOSX() && !Platform.isRoot();
}
private static List addPrivileges(List cmdStringList) {
if (!Platform.isOSX()) {
throw new RuntimeException("Can only add privileges on OSX.");
}

System.out.println("Adding 'sudo -E -n' to the command.");
List outStringList = new ArrayList();
outStringList.add("sudo");
outStringList.add("-E"); // Preserve existing environment variables.
outStringList.add("-n"); // non-interactive. Don't prompt for password. 
Must be cached or not required.
outStringList.addAll(cmdStringList);
return outStringList;
}


All occurrences of skipIfCannotAttach are currently under 
`test/hotspot/jtreg/serviceability/sa or jhsdb`

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13667#discussion_r1281477905


Re: RFR: JDK-8306441: Two phase segmented heap dump [v22]

2023-08-02 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  review from Kevin

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/408e86df..fb72f045

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=21
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=20-21

  Stats: 38 lines in 5 files changed: 24 ins; 2 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v21]

2023-08-02 Thread Yi Yang
On Tue, 1 Aug 2023 14:41:53 GMT, Kevin Walls  wrote:

> I just saw a run with no -parallel option which got num_active_workers = 2, 
> so it did a parallel dump, so the default is being set as intended, but the 
> availability of worker threads is not always predictable!

Yes, now I've added new log outputs request_dump_thread and num_active_worker 
to get a better look at the details of dump.

> Can we add, like in I think another test: run testng/othervm -Xlog:heapdump 
> HeapDumpTest

Done. In order not to interfere with existing tests, I added checks to the 
IntegrityHeapDumpTest.

> Can we get the signed value in HeapDumpDCmd::execute, print a message and 
> return like we do when compression level is out of range?

Good catch! Done.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1661605782


Re: RFR: JDK-8306441: Two phase segmented heap dump [v20]

2023-07-31 Thread Yi Yang
On Wed, 19 Jul 2023 12:42:27 GMT, Yi Yang  wrote:

>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-phase 
>> segmented heap dump:
>> 
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
>> file. This process can happen outside safepoint.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. The changes in the overall design are as follows:
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
>> Fig1. Before
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
>> Fig2. After this patch
>> 
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | ---| --- | --- | --- |  |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   test failure on Windows

Hi Kevin, 

1. CSR

> Actually are we changing the default behaviour: from "parallel if you can", 
> to "serial unless you specify".

We are not changing the default behavior. Now the default is still parallel 
(unless using SerialGC or parallel can not work), the VM will automatically 
calculate the appropriate parallelism unless the user specifies it.

Currently, this patch only adds the -parallel option to jcmd, AFAIK, this does 
not require a CSR. Only modifying jmap would require a CSR.

Given the compatibility issues previously mentioned in 
https://github.com/openjdk/jdk/pull/2261#discussion_r565295921 and subsequent 
bugs such as [JDK-8219721](https://bugs.openjdk.java.net/browse/JDK-8219721) 
[JDK-8219896](https://bugs.openjdk.java.net/browse/JDK-8219896), I'm hesitant 
to add a -parallel option to jmap, as it could lead to similar compatibility 
issues and bugs. Therefore, the best approach may be to keep it as it is:

- For jmap, VM calculates the number of parallel threads (and fallback to 
serial when unable to parallelize)
- For jcmd, VM provides fine-grained control, i.e. allowing -parallel=1-X to 
control whether to use serial or parallel.

Do you think this is acceptable?

2. Test

> Running the changes, I got a test failure on macos debug (x64 and aarch64):
> test/hotspot/jtreg/serviceability/HeapDump/IntegrityHeapDumpTest.java

Fixed.

The reason is that `SATestUtils.createProcessBuilder` in IntegrityHeapDump.java 
may require [user input for a 
password](https://github.com/y1yang0/jdk/actions/runs/5599168245/job/1511990#step:9:978)
 when starting a process. I have modified test to be consistent with 
DuplicateClassArrayClassesTest.java to avert this problem.

3. Release Note

>This enhancement should probably have a release note, and we could briefly 
>explain what parallel means.
> I can help with the wording if needed, if you'd like to start something

Thank you in advance! I've created related release-note issue, and I'll draft 
an initial version for this, please feel free to rephrase it.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1659605530


Re: RFR: JDK-8306441: Two phase segmented heap dump [v21]

2023-07-31 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  test failure on mac

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/469be625..408e86df

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=20
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=19-20

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v20]

2023-07-28 Thread Yi Yang
On Fri, 28 Jul 2023 17:18:05 GMT, Kevin Walls  wrote:

> I think we might be best off leaving the long dump as you have it in the 
> AttachListener for now. 8-)

Consideing a very large heap size, the VM used to pause all threads and spend 1 
hour to execute a heap dump, during which time jcmd/jstack and even the entire 
system were unresponsive. Now, the VM spends only 1 minute pausing all threads 
to execute the heap dump, and then only pauses the attach listener/vm thread 
and spends 1 hour merging files. For the attach listener, its total time 
remains unchanged.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1656540922


Re: RFR: JDK-8306441: Two phase segmented heap dump [v16]

2023-07-28 Thread Yi Yang
On Tue, 11 Jul 2023 20:49:35 GMT, Chris Plummer  wrote:

>> Yi Yang has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - whitespace
>>  - add test
>
> BTW, `serviceability/dcmd/gc/HeapDumpAllTest.java` is failing due to build 
> failures when compiling the test.

PING: @plummercj @kevinjwalls any further review comments? Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1655244031


Re: RFR: JDK-8306441: Two phase segmented heap dump [v20]

2023-07-19 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  test failure on Windows

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/6da71527..469be625

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=19
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=18-19

  Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v19]

2023-07-18 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  review feedback from Chris

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/13ea0992..6da71527

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=18
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=17-18

  Stats: 7 lines in 2 files changed: 2 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v18]

2023-07-17 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-phase 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with two additional commits 
since the last revision:

 - fix SA thread name
 - sa AttachListenerThread counterpart

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/6deeb159..13ea0992

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=17
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=16-17

  Stats: 49 lines in 4 files changed: 45 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v17]

2023-07-16 Thread Yi Yang
On Mon, 17 Jul 2023 04:11:54 GMT, David Holmes  wrote:

> > Because I need the overloaded check is_AttachListener_thread(), which can 
> > avoid using the VM thread to execute the dump file merge as much as 
> > possible.
> 
> I'm unclear what the set of candidate threads is for executing the code that 
> does the `is_AttachListener_thread()` test. Can you not just use 
> `!thread->is_VM_thread()` to keep it out of the VMThread?

Many types of threads can execute HeapDumper, such as JMX heapdump, GC 
HeapDumpOnOutOfMemoryError etc. To ensure safety, I want to make it clear that 
file merging should only be executed using the current thread as an attach 
listener thread. For other situations, it should be uniformly completed using 
the VM Thread. In addition, a potential optimization is that if flags such as 
HeapDumpOnOutOfMemoryError trigger heapdump, the triggerer does not need to 
wait for the merge to complete on site(because they are not in interactive 
mode). It only needs to generate separate files during STW and then specify the 
AttachListenerThread to complete the file merging operation using handshake. 
This requires us to have an AttachListenerThread class.

For future extension considerations and for code readability and safety, I 
added an AttachListenerThread class.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1637424431


Re: RFR: JDK-8306441: Two phase segmented heap dump [v17]

2023-07-16 Thread Yi Yang
On Wed, 12 Jul 2023 07:58:47 GMT, Yi Yang  wrote:

>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-phase 
>> segmented heap dump:
>> 
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
>> file. This process can happen outside safepoint.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. The changes in the overall design are as follows:
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
>> Fig1. Before
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
>> Fig2. After this patch
>> 
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | ---| --- | --- | --- |  |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   fix test compilation failure

Thanks for reviews! 

-
@kevinjwalls 
> if we specify a large number for -parallel=, is it limited to the number of 
> safepoint_workers?

No, it limits how may active_worker is used?

> Should we feedback to the user how many threads were really used?

Yes, I think it makes sense and I will output it in the upcoming changes.

> Should the user really have to always say how many threads? Do they just want 
> to say "use all the worker threads"?

The main purpose of adding the parallel option is that specifying the number of 
parallel threads can have a certain impact on the time, and using all worker 
threads may not always produce the best results. Please see the table above, 
where 96 threads perform worse than 32 threads in many test groups. 
Additionally, parallel also controls the use of parallel dump or serial dump.

It is also worth noting that even if the user specifies parallel, the VM will 
try to use this number of threads to perform parallel dump, but this is not 
guaranteed. The VM will also choose max active_worker based on the size and 
utilization of the heap.


> Does this work with Serial GC? Maybe it doesn't maybe that's unfortunate 
> (Serial may be appropriate for some large heaps, and you still want a faster 
> heap dump?)

Unfortunately, it does not currently work with Serial GC. For Serial GC, 
because there is no worker gang concept, the VM always uses serial dump. We 
have hardly seen any cases of Serial GC in production environments, so I think 
support for Serial GC can be added in a follow-up PRs.

---
@dholmes-ora 
> I'm unclear why we need an explicit AttachListenerThread type to support this.

Because I need the overloaded check is_AttachListener_thread(), which can avoid 
using the VM thread to execute the dump file merge as much as possible.

---
@tstuefe 
> I 

Re: RFR: JDK-8306441: Two phase segmented heap dump [v17]

2023-07-12 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  fix test compilation failure

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/4d372dab..6deeb159

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=16
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=15-16

  Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v16]

2023-07-09 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
> file. This process can happen outside safepoint.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | CompressionMode | STW | Total |
> | ---| --- | --- | --- |  |
> | 8g | 1 T | N | 15.612 | 15.612 |
> | 8g | 32 T | N | 2.561725 | 14.498 |
> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
> | 8g | 96 T | N | 2.6790452 | 14.012 |
> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
> | 16g | 1 T | N | 26.278 | 26.278 |
> | 16g | 32 T | N | 5.231374 | 26.417 |
> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
> | 16g | 96 T | N | 6.2445556 | 27.141 |
> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
> | 32g | 1 T | N | 48.149 | 48.149 |
> | 32g | 32 T | N | 10.7734677 | 61.643 |
> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
> | 32g | 96 T | N | 13.1522042 | 61.432 |
> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
> | 64g | 1 T | N | 100.583 | 100.583 |
> | 64g | 32 T | N | 20.9233744 | 134.701 |
> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
> | 64g | 96 T | N | 26.7374116 | 126.08 |
> | 64g | 96 T | C1 | 16.8101551 | 17.938 |
> | 64g | 96 T | C2 | 80.1626621 | 169.003 |
> | 128g | 1 T | ...

Yi Yang has updated the pull request incrementally with two additional commits 
since the last revision:

 - whitespace
 - add test

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/7cbb4a10..4d372dab

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=15
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=14-15

  Stats: 149 lines in 3 files changed: 133 ins; 0 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two phase segmented heap dump [v14]

2023-07-09 Thread Yi Yang
On Wed, 5 Jul 2023 21:10:56 GMT, Alex Menkov  wrote:

>> I think commit "use HandshakeClosure instead of VMOperation" is a wrong way 
>> to go.
>> It restricts use of parallel dumping only to attach case.
>> I have pending changes in heap dumper to support virtual threads and I'm 
>> going switch heap dumper to always use DumpMerger.
>
>> Hi @alexmenkov,
>> 
>> > It restricts use of parallel dumping only to attach case.
>> 
>> I'm not sure what you mean. Using handshake won't limit it to only attach 
>> cases; it can also be used in other cases like HeapDumpBeforeFullGC. The 
>> only difference is that previously, VMThread merged the dump files, and now 
>> it's the Attach listener that merges them. For the file merging process, my 
>> candidate options are as follows:
>> 
>> 1. Execute with VMThread outside the safepoint, which will block 
>> VMThread from executing other vmoperations.
>> 
>> 2. Execute with Attach listener thread outside the safepoint, which will 
>> block Attach listener from processing requests like jcmd/jstack.
>> 
>> 3. Create a new temporary thread to execute the file merging outside the 
>> safepoint, which will have some resource consumption.
>> 
>> 
>> I don't have a strong motivation to use 2, and 3 may be a good solution with 
>> the availability of virtual threads. If you have any concerns, we can 
>> consider using the most conservative option 1 to simplify the review 
>> process, and then optimize the file merging process in a follow-up patch.
> 
> I mean that you can't be sure that you can use attach listener thread.
> My concerns about attach listener thread are:
> - AttachListener can be disabled at all or fail to initialize;
> - attach listener thread may be not yet available when we need to perform 
> heap dump;
> - need to ensure attach listener thread can't be blocked (for example waiting 
> for next command)
> 
> I think it makes sense to go option 1 now and add optimizations as follow-up 
> changes (they will be much smaller and easier to review).

Thanks @alexmenkov for the reviews! I added corresponding jtreg for it, also I 
found verifyHeapDump is duplicated in several tests, I filed 
https://bugs.openjdk.org/browse/JDK-8311775 as a follow-up test improvement.

@plummercj @kevinjwalls Can I have a second review when you have time? Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1628299494


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v15]

2023-07-05 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/86d2717d-c621-446f-98c2-cce761ec8db5)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/c912c516-83b6-4343-a2e8-5d5bdb99cb87)
> Fig2. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | C...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  use VM_HeapDumpMerge back according to review feedback from Alex

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/2012b5ca..7cbb4a10

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=14
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=13-14

  Stats: 53 lines in 4 files changed: 23 ins; 14 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v14]

2023-07-02 Thread Yi Yang
On Fri, 30 Jun 2023 22:23:19 GMT, Alex Menkov  wrote:

>> Yi Yang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   memory leak
>
> I think commit "use HandshakeClosure instead of VMOperation" is a wrong way 
> to go.
> It restricts use of parallel dumping only to attach case.
> I have pending changes in heap dumper to support virtual threads and I'm 
> going switch heap dumper to always use DumpMerger.

Hi @alexmenkov,

> It restricts use of parallel dumping only to attach case.

I'm not sure what you mean. Using handshake won't limit it to only attach 
cases; it can also be used in other cases like HeapDumpBeforeFullGC. The only 
difference is that previously, VMThread merged the dump files, and now it's the 
Attach listener that merges them. For the file merging process, my candidate 
options are as follows:

1. Execute with VMThread outside the safepoint, which will block VMThread from 
executing other vmoperations.
2. Execute with Attach listener thread outside the safepoint, which will block 
Attach listener from processing requests like jcmd/jstack.
3. Create a new temporary thread to execute the file merging outside the 
safepoint, which will have some resource consumption.

I don't have a strong motivation to use 2, and 3 may be a good solution with 
the availability of virtual threads. If you have any concerns, we can consider 
using the most conservative option 1 to simplify the review process, and then 
optimize the file merging process in a follow-up patch.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1617115036


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v5]

2023-06-28 Thread Yi Yang
On Tue, 16 May 2023 18:41:26 GMT, Chris Plummer  wrote:

>> Hi, can I have a review for this patch?
>
> @y1yang0 Sorry no one has been able to review this so far. The serviceability 
> team is very busy for the next few weeks finishing up JDK 21 changes before 
> RDP1. It's unlikely we'll find time for the review before them.
> 
> I did take a very quick look at the changes just to understand the scope. One 
> thing I noticed that makes this PR hard to review is the code refactoring and 
> relocating that you did. At first it looks like a lot of old code was deleted 
> and a lot of new code added, but in fact most of the new code is just 
> relocated old code. It makes it very hard to tell if there have been any 
> changes to this code. Is there anything you can do to lessen the amount of 
> apparent new code that is actually just moved code?

Hi @plummercj @kevinjwalls @alexmenkov   do you have any plans to review this 
patch recently? I have addressed some bugs and it passed manual tests& all 
jtreg tests.

Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1610946603


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v14]

2023-06-28 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/86d2717d-c621-446f-98c2-cce761ec8db5)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/c912c516-83b6-4343-a2e8-5d5bdb99cb87)
> Fig1. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | C...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  memory leak

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/ecab3116..2012b5ca

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=12-13

  Stats: 52 lines in 1 file changed: 22 ins; 13 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v13]

2023-06-19 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/86d2717d-c621-446f-98c2-cce761ec8db5)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/c912c516-83b6-4343-a2e8-5d5bdb99cb87)
> Fig1. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | C...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  rename fetch_and_add to fetch_then_and

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/02561629..ecab3116

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=11-12

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v12]

2023-06-19 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/86d2717d-c621-446f-98c2-cce761ec8db5)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/c912c516-83b6-4343-a2e8-5d5bdb99cb87)
> Fig1. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | C...

Yi Yang has updated the pull request with a new target base due to a merge or a 
rebase. The pull request now contains 14 commits:

 - Merge branch 'master' of github.com:openjdk/jdk into merge_dump_public
 - use HandshakeClosure instead of VMOperation
 - perform merging by attach listener thread where possible
 - whitespace
 - review from Alex
 - undo refactor -- done
 - undo refactor
 - rename back to heapdumpCompression
 - execute VM_HeapDumper directly
 - remove useless scope
 - ... and 4 more: https://git.openjdk.org/jdk/compare/d2a858e1...02561629

-

Changes: https://git.openjdk.org/jdk/pull/13667/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=11
  Stats: 1269 lines in 10 files changed: 216 ins; 932 del; 121 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v11]

2023-06-19 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/86d2717d-c621-446f-98c2-cce761ec8db5)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/c912c516-83b6-4343-a2e8-5d5bdb99cb87)
> Fig1. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | C...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  perform merging by attach listener thread where possible

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/2280159a..e1328f06

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=09-10

  Stats: 61 lines in 5 files changed: 41 ins; 2 del; 18 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v10]

2023-06-19 Thread Yi Yang
On Sat, 17 Jun 2023 01:20:21 GMT, Alex Menkov  wrote:

>> Yi Yang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   whitespace
>
> Could you please add info about what testing have been done

Hi @alexmenkov,

> Could you please add info about what testing have been done

Including serviceability tests and manual checks for both slowdebug and release 
version.  I'll do more test and sync test result in the main content of this PR.

I push a new commit to optimize the merge stage.  Now it performs heapdump file 
merge operation in attach listener thread when possible, which prevents us from 
occupying the VM Thread, which in turn affects the occurrence of GC and other 
VM operations.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1596609826


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v10]

2023-06-18 Thread Yi Yang
On Sat, 17 Jun 2023 01:22:54 GMT, Alex Menkov  wrote:

> This change adds new option. most likely this requires CSR

Hi @alexmenkov, it seems that adding new parameters to jcmd does not require a 
CSR, only changes like those to jmap do.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13667#discussion_r1233444531


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v10]

2023-06-13 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/86d2717d-c621-446f-98c2-cce761ec8db5)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/c912c516-83b6-4343-a2e8-5d5bdb99cb87)
> Fig1. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | C...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  whitespace

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/04f54d9a..2280159a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=08-09

  Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v9]

2023-06-13 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. The changes in the overall design are as follows:
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/86d2717d-c621-446f-98c2-cce761ec8db5)
> Fig1. Before
> 
> ![image](https://github.com/openjdk/jdk/assets/5010047/c912c516-83b6-4343-a2e8-5d5bdb99cb87)
> Fig1. After this patch
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | C...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  review from Alex

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/c8e80e9e..04f54d9a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=07-08

  Stats: 152 lines in 1 file changed: 78 ins; 70 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v5]

2023-06-11 Thread Yi Yang
On Tue, 16 May 2023 18:41:26 GMT, Chris Plummer  wrote:

>> Hi, can I have a review for this patch?
>
> @y1yang0 Sorry no one has been able to review this so far. The serviceability 
> team is very busy for the next few weeks finishing up JDK 21 changes before 
> RDP1. It's unlikely we'll find time for the review before them.
> 
> I did take a very quick look at the changes just to understand the scope. One 
> thing I noticed that makes this PR hard to review is the code refactoring and 
> relocating that you did. At first it looks like a lot of old code was deleted 
> and a lot of new code added, but in fact most of the new code is just 
> relocated old code. It makes it very hard to tell if there have been any 
> changes to this code. Is there anything you can do to lessen the amount of 
> apparent new code that is actually just moved code?

Hi @plummercj @kevinjwalls @alexmenkov, sorry to disturb you. RDP1 has ended. 
Do you have any plans to review this patch recently? Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1586671507


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v8]

2023-06-05 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks.
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | Compress1 |
> | 32g | 96 thread | 38.9900931 secs | 80.574 secs | Compress2 |
> | 64g | 1 thread |  100.583 secs | 100.583 secs | N |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | N |
> | 64g | 32 thread | 18.5023784 secs | 19.358 secs | Compress1 |
> | 64g | 32 thread | 86.4748377 ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  undo refactor -- done

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/1432cdaa..c8e80e9e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=06-07

  Stats: 53 lines in 4 files changed: 19 ins; 23 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Two-stage Segmented Heap Dump [v7]

2023-06-05 Thread Yi Yang
> ### Motivation and proposal
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks.
> 
> ### Performance evaluation
> | memory | numOfThread | STW | Total  | Compression |
> | --- | - | -- |  |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs | N |
> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs | N |
> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
> | 16g | 96 thread | 19.2965783 secs |  39.007 secs | Compress2 |
> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | Compress1 |
> | 32g | 96 thread | 38.9900931 secs | 80.574 secs | Compress2 |
> | 64g | 1 thread |  100.583 secs | 100.583 secs | N |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | N |
> | 64g | 32 thread | 18.5023784 secs | 19.358 secs | Compress1 |
> | 64g | 32 thread | 86.4748377 ...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  undo refactor

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/7485e6a7..1432cdaa

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=05-06

  Stats: 1332 lines in 4 files changed: 651 ins; 654 del; 27 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Enhance parallel heap dump [v6]

2023-06-05 Thread Yi Yang
On Mon, 5 Jun 2023 23:44:04 GMT, Alex Menkov  wrote:

> A lot of code movement is caused by moving AbstractDumpWriter class from 
> heapDumper.cpp to heapDumpCompression.hpp/cpp
> I'm not happy with huge heapDumper.cpp file, but this refactoring does not 
> look good to me.
> Currently all logic about hprof file structure is in heapDumper.cpp, it's not 
> visible outside.
> With your fix the logic is spread in heapDumper.hpp/.cpp and 
> heapDumpCompression.hpp/cpp

I will undo all refactor code. Let us firstly focuse on proposal itself. The 
refactor can be discussed later.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-150928


Re: RFR: JDK-8306441: Enhance parallel heap dump [v6]

2023-05-21 Thread Yi Yang
On Thu, 18 May 2023 11:02:21 GMT, Yi Yang  wrote:

> I'm interested in if it is faster in STW time with the parallel write to use 
> compression or not? (Does not compressing take time, or does compressing mean 
> we write fewer bytes, so ends up being faster.)

I updated the benchmark data when turning compression on. In general, this 
patch significantly reduces 73~80% application pause time. When compression is 
enabled, STW/Total time heavily depends on the sparseness of the application 
heap. If the heap is full of compressible(e.g. all objects are empty byte 
array), Total ≈ STW, the merge process is incredibly fast. If the heap data is 
not suitable for compression(e.g. all objects are full of random data), the STW 
reduction is not appealing, the total dump time is also increased.

P.S. After further consideration, I think the title "Enhance heap dump" is also 
not quite accurate. This patch has fundamentally changed the behavior of heap 
dump, where before it was used to generate a complete dump, now it is used to 
generate multiple dump files and then combine them into a single dump. I am 
currently considering a more appropriate title.

Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1556473626


Re: RFR: JDK-8306441: Enhance parallel heap dump [v6]

2023-05-18 Thread Yi Yang
On Thu, 18 May 2023 09:35:36 GMT, Kevin Walls  wrote:

> I'm interested in if it is faster in STW time with the parallel write to use 
> compression or not? (Does not compressing take time, or does compressing mean 
> we write fewer bytes, so ends up being faster.)

Hi @kevinjwalls, so far all benchmark data are based on parallel dumps without 
compression. I will add more benchmark data about parallel dump data with 
compression later.
 
> The idea of forking a child to do the write seems appealing, although it is a 
> non-Windows solution. Isn't the point of a fork that the parent can keep 
> running without any pause at all? The child gets a copy of all memory, the 
> fork happened at a safepoint so the Java heap is in a good state, and it can 
> take its time to do the write, then exit without affecting the parent. The 
> child probably must not make any Java calls or interact with other existing 
> JVM threads.
(fork not vfork. The child is making no writes to the Java heap so should not 
cause the entire Java heap to be duplicated in memory. That overhead might be 
large though particularly if the app is making changes...)

I think the real pauseless is impossible, long time STW is amortized by minor 
STWs as once writing happens in parent process, child process observes them by 
userfaultfd and corresponding pages are prioritized for dumping in safepoint 
region.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1552891620


Re: RFR: JDK-8306441: Segmented heap dump [v6]

2023-05-17 Thread Yi Yang
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. It significantly reduces 73~80% application pause time. 
> 
> | memory | numOfThread | STW | Total  |
> | --- | - | -- |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
> | 16g | 1 thread | 26.278 secs | 26.278 secs |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
> | 32g | 1 thread | 48.149 secs | 48.149 secs |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
> | 64g | 1 thread |  100.583 secs | 100.583 secs |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
> | 128g | 1 thread | 233.843 secs | 233.843 secs |
> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
> 
>> **Total** means the total heap dump including both two phases
>> **STW** means the first phase only.
>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>> **Total** = **STW**
> 
> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
> 
> In actual testing, two-stage solution can lead to an increase in the overall 
> time for heapdump(See table above). However, considering the reduction of STW 
> time, I think it is an acceptable trade-off. Furthermore, there is still room 
> for optimization in the second merge stage(e.g. 
> sendfile/splice/copy_file_range instead of read+write combination). Since 
> number of parallel dump thread has a cons...

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  rename back to heapdumpCompression

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/4de781e9..7485e6a7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=04-05

  Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Segmented heap dump [v5]

2023-05-17 Thread Yi Yang
On Mon, 15 May 2023 02:16:43 GMT, Yi Yang  wrote:

>> Yi Yang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   execute VM_HeapDumper directly
>
> Hi, can I have a review for this patch?

> @y1yang0 Sorry no one has been able to review this so far. The serviceability 
> team is very busy for the next few weeks finishing up JDK 21 changes before 
> RDP1. It's unlikely we'll find time for the review before them.
> 
> I did take a very quick look at the changes just to understand the scope. One 
> thing I noticed that makes this PR hard to review is the code refactoring and 
> relocating that you did. At first it looks like a lot of old code was deleted 
> and a lot of new code added, but in fact most of the new code is just 
> relocated old code. It makes it very hard to tell if there have been any 
> changes to this code. Is there anything you can do to lessen the amount of 
> apparent new code that is actually just moved code?

Hi @plummercj Thanks for your heads-up, I saw RPD1 ends at 06/08, I will ping 
everyone again in mid June and hope to have reviews when you have time.

The refactoring is necessary because
- All related code about ParDumpWriter is unused after this patch
- In `heapDumperCompression` file, most of the code is used for writing heap 
dump rather than compression(after applying this patch). Therefore, I renamed 
it to `heapDumperWriter`. However, considering that this may make the review 
process difficult, I will revert this change and keep the original file name 
for now. When the review process reaches a proper time point, I will rename it 
again.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1551031432


Re: RFR: JDK-8306441: Segmented heap dump [v5]

2023-05-14 Thread Yi Yang
On Wed, 10 May 2023 08:29:43 GMT, Yi Yang  wrote:

>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-stage 
>> segmented heap dump:
>> 
>> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
>> file.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. It significantly reduces 73~80% application pause time. 
>> 
>> | memory | numOfThread | STW | Total  |
>> | --- | - | -- |  |
>> | 8g | 1 thread | 15.612 secs | 15.612 secs |
>> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
>> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
>> | 16g | 1 thread | 26.278 secs | 26.278 secs |
>> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
>> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
>> | 32g | 1 thread | 48.149 secs | 48.149 secs |
>> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
>> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
>> | 64g | 1 thread |  100.583 secs | 100.583 secs |
>> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
>> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
>> | 128g | 1 thread | 233.843 secs | 233.843 secs |
>> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
>> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
>> 
>>> **Total** means the total heap dump including both two phases
>>> **STW** means the first phase only.
>>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>>> **Total** = **STW**
>> 
>> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
>> 
>> In actual testing, two-stage solution can lead to an increase in the overall 
>> time for heapdump(See table above). However, considering the reduction of 
>> STW time, I think it is an acceptable trade-off. Furthermore, there is still 
>> room for optimization in the second merge stage(e.g. 
>> sendfile/splice/copy_file_range instead of read+write combination). Since 
>> number of...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   execute VM_HeapDumper directly

Hi, can I have a review for this patch?

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1547101136


Re: RFR: JDK-8306441: Segmented heap dump [v5]

2023-05-10 Thread Yi Yang
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. It significantly reduces 73~80% application pause time. 
> 
> | memory | numOfThread | STW | Total  |
> | --- | - | -- |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
> | 16g | 1 thread | 26.278 secs | 26.278 secs |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
> | 32g | 1 thread | 48.149 secs | 48.149 secs |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
> | 64g | 1 thread |  100.583 secs | 100.583 secs |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
> | 128g | 1 thread | 233.843 secs | 233.843 secs |
> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
> 
>> **Total** means the total heap dump including both two phases
>> **STW** means the first phase only.
>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>> **Total** = **STW**
> 
> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
> 
> In actual testing, two-stage solution can lead to an increase in the overall 
> time for heapdump(See table above). However, considering the reduction of STW 
> time, I think it is an acceptable trade-off. Furthermore, there is still room 
> for optimization in the second merge stage(e.g. 
> sendfile/splice/copy_file_range instead of read+write combination). Since 
> number of parallel dump thread has a considerable impact on total dump time, 
> I added a parameter that allows users to specify the number of parallel dump 
> thread they wish to run.
> 
> # Open discussion
> 
> - Pauseless heap dump solution?
> An alternative pauseless solution is to fork a child process, set the parent 
> process heap to read-only, and dump the heap in child process. Once writing 
> happens in parent process, child process observes them by userfaultfd and 
> corresponding pages are prioritized for dumping. I'm also looking forward to 
> hearing comments and discussions about this solution.
> 
> - Client parser support for segmented heap dump
> This patch provides a possibility that whether heap dump needs to be complete 
> or not, can the VM directly generate segmented heapdump, and let the client 
> parser complete the merge process? Looking forward to hearing comments from 
> the Eclipse MAT community

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  execute VM_HeapDumper directly

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/9ee6fe4b..4de781e9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=03-04

  Stats: 7 lines in 1 file changed: 0 ins; 5 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Segmented heap dump [v4]

2023-05-07 Thread Yi Yang
On Thu, 4 May 2023 08:40:10 GMT, Yi Yang  wrote:

>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-stage 
>> segmented heap dump:
>> 
>> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
>> file.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. It significantly reduces 73~80% application pause time. 
>> 
>> | memory | numOfThread | STW | Total  |
>> | --- | - | -- |  |
>> | 8g | 1 thread | 15.612 secs | 15.612 secs |
>> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
>> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
>> | 16g | 1 thread | 26.278 secs | 26.278 secs |
>> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
>> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
>> | 32g | 1 thread | 48.149 secs | 48.149 secs |
>> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
>> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
>> | 64g | 1 thread |  100.583 secs | 100.583 secs |
>> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
>> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
>> | 128g | 1 thread | 233.843 secs | 233.843 secs |
>> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
>> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
>> 
>>> **Total** means the total heap dump including both two phases
>>> **STW** means the first phase only.
>>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>>> **Total** = **STW**
>> 
>> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
>> 
>> In actual testing, two-stage solution can lead to an increase in the overall 
>> time for heapdump(See table above). However, considering the reduction of 
>> STW time, I think it is an acceptable trade-off. Furthermore, there is still 
>> room for optimization in the second merge stage(e.g. 
>> sendfile/splice/copy_file_range instead of read+write combination). Since 
>> number of parallel dump thread has a considerable impact on total dump time, 
>> I added a parameter that allows users to specify the number of parallel dump 
>> thread they wish to run.
>> 
>> # Open discussion
>> 
>> - Pauseless heap dump solution?
>> An alternative pauseless solution is to fork a child process, set the parent 
>> process heap to read-only, and dump the heap in child process. Once writing 
>> happens in parent process, child process observes them by userfaultfd and 
>> corresponding pages are prioritized for dumping. I'm also looking forward to 
>> hearing comments and discussions about this solution.
>> 
>> - Client parser support for segmented heap dump
>> This patch provides a possibility that whether heap dump needs to be 
>> complete or not, can the VM directly generate segmented heapdump, and let 
>> the client parser complete the merge process? Looking forward to hearing 
>> comments from the Eclipse MAT community
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   remove useless scope

Hi, can I have a review fot this? It significantly reduces heapdump STW time. 
Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1537626780


Re: RFR: JDK-8306441: Segmented heap dump [v4]

2023-05-04 Thread Yi Yang
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. It significantly reduces 73~80% application pause time. 
> 
> | memory | numOfThread | STW | Total  |
> | --- | - | -- |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
> | 16g | 1 thread | 26.278 secs | 26.278 secs |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
> | 32g | 1 thread | 48.149 secs | 48.149 secs |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
> | 64g | 1 thread |  100.583 secs | 100.583 secs |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
> | 128g | 1 thread | 233.843 secs | 233.843 secs |
> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
> 
>> **Total** means the total heap dump including both two phases
>> **STW** means the first phase only.
>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>> **Total** = **STW**
> 
> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
> 
> In actual testing, two-stage solution can lead to an increase in the overall 
> time for heapdump(See table above). However, considering the reduction of STW 
> time, I think it is an acceptable trade-off. Furthermore, there is still room 
> for optimization in the second merge stage(e.g. 
> sendfile/splice/copy_file_range instead of read+write combination). Since 
> number of parallel dump thread has a considerable impact on total dump time, 
> I added a parameter that allows users to specify the number of parallel dump 
> thread they wish to run.
> 
> # Open discussion
> 
> - Pauseless heap dump solution?
> An alternative pauseless solution is to fork a child process, set the parent 
> process heap to read-only, and dump the heap in child process. Once writing 
> happens in parent process, child process observes them by userfaultfd and 
> corresponding pages are prioritized for dumping. I'm also looking forward to 
> hearing comments and discussions about this solution.
> 
> - Client parser support for segmented heap dump
> This patch provides a possibility that whether heap dump needs to be complete 
> or not, can the VM directly generate segmented heapdump, and let the client 
> parser complete the merge process? Looking forward to hearing comments from 
> the Eclipse MAT community

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  remove useless scope

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/9e563ca7..9ee6fe4b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=02-03

  Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Segmented heap dump [v3]

2023-04-28 Thread Yi Yang
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. It significantly reduces 73~80% application pause time. 
> 
> | memory | numOfThread | STW | Total  |
> | --- | - | -- |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
> | 16g | 1 thread | 26.278 secs | 26.278 secs |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
> | 32g | 1 thread | 48.149 secs | 48.149 secs |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
> | 64g | 1 thread |  100.583 secs | 100.583 secs |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
> | 128g | 1 thread | 233.843 secs | 233.843 secs |
> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
> 
>> **Total** means the total heap dump including both two phases
>> **STW** means the first phase only.
>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>> **Total** = **STW**
> 
> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
> 
> In actual testing, two-stage solution can lead to an increase in the overall 
> time for heapdump(See table above). However, considering the reduction of STW 
> time, I think it is an acceptable trade-off. Furthermore, there is still room 
> for optimization in the second merge stage(e.g. 
> sendfile/splice/copy_file_range instead of read+write combination). Since 
> number of parallel dump thread has a considerable impact on total dump time, 
> I added a parameter that allows users to specify the number of parallel dump 
> thread they wish to run.
> 
> # Open discussion
> 
> - Pauseless heap dump solution?
> An alternative pauseless solution is to fork a child process, set the parent 
> process heap to read-only, and dump the heap in child process. Once writing 
> happens in parent process, child process observes them by userfaultfd and 
> corresponding pages are prioritized for dumping. I'm also looking forward to 
> hearing comments and discussions about this solution.
> 
> - Client parser support for segmented heap dump
> This patch provides a possibility that whether heap dump needs to be complete 
> or not, can the VM directly generate segmented heapdump, and let the client 
> parser complete the merge process? Looking forward to hearing comments from 
> the Eclipse MAT community

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  refactor VM_HeapDumpMerge

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/00b49e4e..9e563ca7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=01-02

  Stats: 83 lines in 1 file changed: 43 ins; 35 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: JDK-8306441: Segmented heap dump [v2]

2023-04-26 Thread Yi Yang
> Hi, heap dump brings about pauses for application's execution(STW), this is a 
> well-known pain. JDK-8252842 have added parallel support to heapdump in an 
> attempt to alleviate this issue. However, all concurrent threads 
> competitively write heap data to the same file, and more memory is required 
> to maintain the concurrent buffer queue. In experiments, we did not feel a 
> significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage 
> segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
> files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
> file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, 
> which would result in more memory overhead, nor do they need to compete for 
> locks. It significantly reduces 73~80% application pause time. 
> 
> | memory | numOfThread | STW | Total  |
> | --- | - | -- |  |
> | 8g | 1 thread | 15.612 secs | 15.612 secs |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
> | 16g | 1 thread | 26.278 secs | 26.278 secs |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
> | 32g | 1 thread | 48.149 secs | 48.149 secs |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
> | 64g | 1 thread |  100.583 secs | 100.583 secs |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
> | 128g | 1 thread | 233.843 secs | 233.843 secs |
> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
> 
>> **Total** means the total heap dump including both two phases
>> **STW** means the first phase only.
>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>> **Total** = **STW**
> 
> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
> 
> In actual testing, two-stage solution can lead to an increase in the overall 
> time for heapdump(See table above). However, considering the reduction of STW 
> time, I think it is an acceptable trade-off. Furthermore, there is still room 
> for optimization in the second merge stage(e.g. 
> sendfile/splice/copy_file_range instead of read+write combination). Since 
> number of parallel dump thread has a considerable impact on total dump time, 
> I added a parameter that allows users to specify the number of parallel dump 
> thread they wish to run.
> 
> # Open discussion
> 
> - Pauseless heap dump solution?
> An alternative pauseless solution is to fork a child process, set the parent 
> process heap to read-only, and dump the heap in child process. Once writing 
> happens in parent process, child process observes them by userfaultfd and 
> corresponding pages are prioritized for dumping. I'm also looking forward to 
> hearing comments and discussions about this solution.
> 
> - Client parser support for segmented heap dump
> This patch provides a possibility that whether heap dump needs to be complete 
> or not, can the VM directly generate segmented heapdump, and let the client 
> parser complete the merge process? Looking forward to hearing comments from 
> the Eclipse MAT community

Yi Yang has updated the pull request incrementally with two additional commits 
since the last revision:

 - max_path check
 - fix test

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/620d94dc..00b49e4e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=00-01

  Stats: 21 lines in 1 file changed: 12 ins; 5 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


RFR: JDK-8306441: Segmented heap dump

2023-04-26 Thread Yi Yang
Hi, heap dump brings about pauses for application's execution(STW), this is a 
well-known pain. JDK-8252842 have added parallel support to heapdump in an 
attempt to alleviate this issue. However, all concurrent threads competitively 
write heap data to the same file, and more memory is required to maintain the 
concurrent buffer queue. In experiments, we did not feel a significant 
performance improvement from that.

The minor-pause solution, which is presented in this PR, is a two-stage 
segmented heap dump:

1. Stage One(STW): Concurrent threads directly write data to multiple heap 
files.
2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
file.

Now concurrent worker threads are not required to maintain a buffer queue, 
which would result in more memory overhead, nor do they need to compete for 
locks. It significantly reduces 73~80% application pause time. 

| memory | numOfThread | STW | Total  |
| --- | - | -- |  |
| 8g | 1 thread | 15.612 secs | 15.612 secs |
| 8g | 32 thread |  2.5617250 secs | 14.498 secs |
| 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
| 16g | 1 thread | 26.278 secs | 26.278 secs |
| 16g | 32 thread |  5.2313740 secs | 26.417 secs |
| 16g | 96 thread | 6.2445556 secs | 27.141 secs |
| 32g | 1 thread | 48.149 secs | 48.149 secs |
| 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
| 32g | 96 thread | 13.1522042 secs |  61.432 secs |
| 64g | 1 thread |  100.583 secs | 100.583 secs |
| 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
| 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
| 128g | 1 thread | 233.843 secs | 233.843 secs |
| 128g | 32 thread | 72.9945768 secs | 207.060 secs |
| 128g | 96 thread | 67.6815929 secs | 336.345 secs |

> **Total** means the total heap dump including both two phases
> **STW** means the first phase only.
> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
> **Total** = **STW**

![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)

In actual testing, two-stage solution can lead to an increase in the overall 
time for heapdump(See table above). However, considering the reduction of STW 
time, I think it is an acceptable trade-off. Furthermore, there is still room 
for optimization in the second merge stage(e.g. sendfile/splice/copy_file_range 
instead of read+write combination). Since number of parallel dump thread has a 
considerable impact on total dump time, I added a parameter that allows users 
to specify the number of parallel dump thread they wish to run.

# Open discussion

- Pauseless heap dump solution?
An alternative pauseless solution is to fork a child process, set the parent 
process heap to read-only, and dump the heap in child process. Once writing 
happens in parent process, child process observes them by userfaultfd and 
corresponding pages are prioritized for dumping. I'm also looking forward to 
hearing comments and discussions about this solution.

- Client parser support for segmented heap dump
This patch provides a possibility that whether heap dump needs to be complete 
or not, can the VM directly generate segmented heapdump, and let the client 
parser complete the merge process? Looking forward to hearing comments from the 
Eclipse MAT community

-

Commit messages:
 - JDK-8306441: Segmented heap dump

Changes: https://git.openjdk.org/jdk/pull/13667/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8306441
  Stats: 2838 lines in 11 files changed: 1006 ins; 1770 del; 62 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


Re: RFR: 8303151: DCmd framework cleanups [v3]

2023-03-05 Thread Yi Yang
The message from this sender included one or more files
which could not be scanned for virus detection; do not
open these files unless you are certain of the sender's intent.

--
On Sun, 5 Mar 2023 05:46:37 GMT, David Holmes  wrote:

>> Whilst working on the DCmd code I noticed two items that could be cleaned up:
>> 
>> 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead 
>> of inside it.
>> 
>> 2. The "extension" mechanism to define external DCmds (as added by 
>> [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for 
>> `UnlockCommercialFeatures`) is no longer needed.
>> 
>> Testing: tiers 1-3
>> 
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Relocate regoster_dcmds to DCmd class and get rid of DCmdRegistrat class

Thanks for doing this. Looks good to me.

-

Marked as reviewed by yyang (Committer).

PR: https://git.openjdk.org/jdk/pull/12847


Integrated: 8299518: HotSpotVirtualMachine shared code across different platforms

2023-03-05 Thread Yi Yang
On Tue, 3 Jan 2023 09:34:55 GMT, Yi Yang  wrote:

> harmless refactor to share code across different platforms of 
> VirtualMachineImpl:
> 1. Shared code to process command response after requesting a command 
> execution
> 2. Read functionality in SocketInputStream can be reused

This pull request has now been integrated.

Changeset: 10d6a8e6
Author:Yi Yang 
URL:   
https://git.openjdk.org/jdk/commit/10d6a8e66a911d876239e44afbd76f7faf660cc3
Stats: 396 lines in 5 files changed: 94 ins; 240 del; 62 mod

8299518: HotSpotVirtualMachine shared code across different platforms

Reviewed-by: cjplummer, dholmes

-

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8303151: DCmd framework cleanups

2023-03-02 Thread Yi Yang
On Fri, 3 Mar 2023 04:59:44 GMT, David Holmes  wrote:

> Whilst working on the DCmd code I noticed two items that could be cleaned up:
> 
> 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead 
> of inside it.
> 
> 2. The "extension" mechanism to define external DCmds (as added by 
> [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for 
> `UnlockCommercialFeatures`) is no longer needed.
> 
> Testing: tiers 1-3
> 
> Thanks

Is it possible to remove DCmdRegistrant too? Maybe `void 
DCmdFactory::register_dcmds` or `void register_dcmds`

-

PR: https://git.openjdk.org/jdk/pull/12847


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v8]

2023-03-02 Thread Yi Yang
> harmless refactor to share code across different platforms of 
> VirtualMachineImpl:
> 1. Shared code to process command response after requesting a command 
> execution
> 2. Read functionality in SocketInputStream can be reused

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  have/has

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/11823/files
  - new: https://git.openjdk.org/jdk/pull/11823/files/6033fcc4..5dabc62e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=06-07

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/11823.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11823/head:pull/11823

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v2]

2023-03-02 Thread Yi Yang
On Thu, 5 Jan 2023 03:01:22 GMT, Serguei Spitsyn  wrote:

>> Yi Yang has refreshed the contents of this pull request, and previous 
>> commits have been removed. The incremental views will show differences 
>> compared to the previous content of the PR. The pull request contains two 
>> new commits since the last revision:
>> 
>>  - separate renaming
>>  - 8299518: HotSpotVirtualMachine shared code across different platforms
>
> I like the approach in general.
> Also, I agree with David on his comments, especially on the renaming.
> The abstract methods `readImpl()` and `closeImpl()` is better to name as 
> `read()` and `close()`.

@sspitsyn @dholmes-ora @turbanoff May I ask your help to review this patch? 
Thanks.

-

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v7]

2023-02-27 Thread Yi Yang
> harmless refactor to share code across different platforms of 
> VirtualMachineImpl:
> 1. Shared code to process command response after requesting a command 
> execution
> 2. Read functionality in SocketInputStream can be reused

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  fd set -1

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/11823/files
  - new: https://git.openjdk.org/jdk/pull/11823/files/6077ae5e..6033fcc4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=05-06

  Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/11823.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11823/head:pull/11823

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v6]

2023-02-26 Thread Yi Yang
On Tue, 10 Jan 2023 10:49:52 GMT, Yi Yang  wrote:

>> harmless refactor to share code across different platforms of 
>> VirtualMachineImpl:
>> 1. Shared code to process command response after requesting a command 
>> execution
>> 2. Read functionality in SocketInputStream can be reused
>
> Yi Yang has updated the pull request incrementally with three additional 
> commits since the last revision:
> 
>  - format
>  - Merge branch 'jdk_virtualmachienimpl' of github.com:y1yang0/jdk into 
> jdk_virtualmachienimpl
>  - -1 to vmid

Can I have any review? Thanks.

-

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v6]

2023-02-14 Thread Yi Yang
On Wed, 11 Jan 2023 19:13:20 GMT, Chris Plummer  wrote:

>> Yi Yang has updated the pull request incrementally with three additional 
>> commits since the last revision:
>> 
>>  - format
>>  - Merge branch 'jdk_virtualmachienimpl' of github.com:y1yang0/jdk into 
>> jdk_virtualmachienimpl
>>  - -1 to vmid
>
> src/jdk.attach/share/classes/sun/tools/attach/HotSpotVirtualMachine.java line 
> 455:
> 
>> 453: if (fd != -1) {
>> 454: close(fd);
>> 455: }
> 
> There used to be logic to set `fd` (previously called `s`) to -1 during the 
> close operation. Is that no longer needed?

Hi @plummercj , sorry for late reply. 

I don't see why we need it because this method is protected by synchronized. fd 
is only set by ctor.

-

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFC: regarding metaspace(metadata?) dump

2023-01-11 Thread Yi Yang
Hi Ioi,
> I think there are overlaps between your proposal and existing tools. For 
> example, there are jcmd options such as VM.class_hierarchy and VM.classes, 
> etc.
> The Serviceability Agent can also be used to analyze the contents of the 
> class metadata.
Of course, we can continue to add jcmd commands such as jcmd VM.method_counter 
and jcmd VM.aggregtate_by_class_package to help diagnosing, but another once 
and for all solution is to implement a rich and well-formed metadata dump as 
this proposal described, third-party parsers and platforms are eligible to 
analyze well-formed dump file and provide many grouping/filtering 
options(grouping_by_package, filter_linked, filter_force_inline, essentially 
VM.class_hierarchy is aggregation of VM.classes).
I'm trying to describe a real use case to illustrate benefits of well-formed 
metaspace dump: In our internal DevOps platform, I observed that the Metaspace 
utilization rate of my application has been high. During this period, FGC 
occurred several times. So I generate a well-formed metaspace dump through 
DevOps platform, and then the dump file will be automatically generated and 
uploaded to another internal Java troubleshooting platform, troubleshooting 
platform further analyzes and show it with many grouping and filter options and 
so on.
> I'd be interested in seeing your implementation and compare it with the 
> existing tools.
I'm starting to do this, and it may take several months to implement since it 
looks more like a JEP level feature, I want to hear some general discussion 
before coding, i.e, is it acceptable to use JSON format? should it be Metadata 
Dump or keeping the current metaspace scope? Do you think basic+extend output 
for internal structure is acceptable?
> This may be quite difficult, because the metadata contains rewritten Java 
> bytecodes. The rewriting format may be dependent on the JDK version. Also, 
> the class linkage (the resolution of constant pool information) will be 
> vastly from one JDK version to another. So using writing a third party tool 
> that can work with multiple JDK versions will be quite hard.
Thanks for your input! Maybe display rewrited bytecodes? Anyway, I'll take a 
close look at this, and I'll prepare a POC along with dump parser and a simple 
UI diagnose web once ready.
> Also, defining a "portable" format for the dump will be difficult, since we 
> don't know how the internal data structure will evolve in the future.
Yes, since we don't know how internal data structure will changed in the 
future, so I propose reaching a consensus that we can at least reconstruct Java 
(rewrited?) source code as much as possible. For example, the dumped JSON 
object for InstanceKlass contains two parts, the first part contains the 
necessary information to reconstruct the source code as much as possible, and 
the second part is extended information, like this:
{
 name:..,
 super:..,
 flags:...,
 method:[]
 interface:[]
 fields:[],
 annotation:[]
 bytecode:[],
 constantpool:[],
 //extend
 init_state:...,
 init_thread:...,
}
The first part is basically unchanged(or adding new fields only), and the 
extended part is subject to change, visualization dump client checks if fields 
of JSON objects are defined and displays them further.
--
From:Ioi Lam 
Send Time:2023 Jan. 12 (Thu.) 08:15
To:hotspot-runtime-dev ; 
serviceability-...@openjdk.java.net 
Subject:Re: RFC: regarding metaspace(metadata?) dump
 CC-ing serviceability.
 Hi Yi,
 In general, I think it's good to have tools for understanding the internal 
layout of the class metadata layouts.
 I think there are overlaps between your proposal and existing tools. For 
example, there are jcmd options such as VM.class_hierarchy and VM.classes, etc.
 The Serviceability Agent can also be used to analyze the contents of the class 
metadata.
 Dd you look at the existing tools and see how they match up with your 
requirements?
 I'd be interested in seeing your implementation and compare it with the 
existing tools.
On 1/11/2023 4:56 AM, Yi Yang wrote:
Hi,
Internally, we often receive feedback from users and ask for help on 
metaspace-related issues, for example
1. Users are eager to know which GroovyClassLoader loads which classes, why 
they are not unloaded,
and why they are leading to Metaspace OOME.
2. They want to know the class structure of dynamically generated classes in 
some scenarios such as 
deserialization
3. Finding memory leaking about duplicated classes
...
Internally we implemented a metaspace dump that generates human-readable text, 
it looks something like this:
[Basic Information]
Dump Reason : JCMD
MaxMetaspaceSize : 18446744073709547520 B
CompressedClassSpaceSize : 1073741824 B
Class Space Used : 309992 B
Class Space Capacity : 395264 B
...
[Class Loader Data]
ClassLoaderData : loader

Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v6]

2023-01-10 Thread Yi Yang
> harmless refactor to share code across different platforms of 
> VirtualMachineImpl:
> 1. Shared code to process command response after requesting a command 
> execution
> 2. Read functionality in SocketInputStream can be reused

Yi Yang has updated the pull request incrementally with three additional 
commits since the last revision:

 - format
 - Merge branch 'jdk_virtualmachienimpl' of github.com:y1yang0/jdk into 
jdk_virtualmachienimpl
 - -1 to vmid

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/11823/files
  - new: https://git.openjdk.org/jdk/pull/11823/files/9d6aa99f..6077ae5e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=04-05

  Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/11823.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11823/head:pull/11823

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v5]

2023-01-09 Thread Yi Yang
> harmless refactor to share code across different platforms of 
> VirtualMachineImpl:
> 1. Shared code to process command response after requesting a command 
> execution
> 2. Read functionality in SocketInputStream can be reused

Yi Yang has updated the pull request incrementally with one additional commit 
since the last revision:

  Update 
src/jdk.attach/share/classes/sun/tools/attach/HotSpotVirtualMachine.java
  
  Co-authored-by: Andrey Turbanov 

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/11823/files
  - new: https://git.openjdk.org/jdk/pull/11823/files/e9d8c36f..9d6aa99f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/11823.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11823/head:pull/11823

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v4]

2023-01-08 Thread Yi Yang
On Thu, 5 Jan 2023 04:28:46 GMT, David Holmes  wrote:

> I was thinking more about describing what the parameters are - in particular 
> it is unclear why the actual IOE can be replaced by a passed in one. If we 
> wanted a custom message then I would at least expect to chain the actual IOE 
> to the replacement.

Done. Parameter ioe means if we get IOE during previous command execution, 
delay throwing it until command execution completion status has been read. 
Since all other parameters are straightforward and this is a private method, I 
don't write  strandard `@param` descriptions for them.

-

PR: https://git.openjdk.org/jdk/pull/11823


Re: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v4]

2023-01-08 Thread Yi Yang
> harmless refactor to share code across different platforms of 
> VirtualMachineImpl:
> 1. Shared code to process command response after requesting a command 
> execution
> 2. Read functionality in SocketInputStream can be reused

Yi Yang has updated the pull request incrementally with two additional commits 
since the last revision:

 - describe parameter
 - Revert "separate renaming"
   
   This reverts commit cb70ac62fdfa605ed4f9e415c4ab73bfc8a3.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/11823/files
  - new: https://git.openjdk.org/jdk/pull/11823/files/0580ab2d..e9d8c36f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=02-03

  Stats: 69 lines in 4 files changed: 2 ins; 0 del; 67 mod
  Patch: https://git.openjdk.org/jdk/pull/11823.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11823/head:pull/11823

PR: https://git.openjdk.org/jdk/pull/11823


  1   2   >