Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. With some rebalancing/adjustments to our test landscape the issues are gone. Unfortunately there was not much interest in the resource related discussion on jtreg-dev https://mail.openjdk.org/pipermail/jtreg-dev/2024-February/001926.html closing for now because the issues are currently not seen any more on our side. - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1996885622 PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1996886625
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. Hi [~[jaikiran] the exclude and match files sound promising, this could be helpful to achieve what we need/want . - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1954373348
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. > What do you think about marking jtreg tests with higher memory requirements > with a jtreg key like highmemusage ? This way we do not need to put these > tests into the exclusiveAccess.dirs group, but get a way (only if needed) to > execute those with high memory usage separately e.g. with lower concurrency. `jtreg --help Tests` shows this (among other things): Test Selection Options These options can be used to refine the set of tests to be executed. ... -exclude: | -Xexclude: Provide a file specifying tests that should not be run ... -match: Provide a file specifying tests that can be run (inverse of -exclude) Maybe you could experiment with these options to exclude the `java/lang/StringBuilder` test directory from your high concurrency run and then only run those in a low concurrency run? - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1952574423
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. Thank you for those additional details. > It happens on various machines, two for example >Windows Server 2022 Standard 16 cores 32G RAM >Windows Server 2019 Standard 16 cores 32G RAM > >On both machines we run :tier1 -avm with -conc:15 (concurrency jtreg flag) . That then looks like (an extremely high) concurrency of 15 that has been explicitly set when launching those tests. By default, the concurrency gets set to `num_cores/2` (so should have been 8 in your case) https://github.com/openjdk/jdk/blob/master/make/RunTests.gmk#L152. I had a quick look at our internal CI, a lot of our Windows systems use 12 core and 24 GB setups (I haven't looked at all of them). The tests on those systems end up using a concurrency of 6 (which is default computed in that RunTests.gmk and matches the `num_cores/2` arithmetic). - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1952559923
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. It happens on various machines, two for example Windows Server 2022 Standard 16 cores 32G RAM Windows Server 2019 Standard 16 cores 32G RAM On both machines we run :tier1 -avm with -conc:15 (concurrency jtreg flag) . > The other unanswered question is - why is this happening now? I filed the issue this year but there are a couple of occurrences also from last year. I find also similar older failures from 2022 of java/lang/StringBuilder/HugeCapacity.java because of resource shortages (but those did not generate a hserr file for some reasons just some text output). So the issue is there for months already (maybe years?) . - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1952490909
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. The other unanswered question is - why is this happening now? I did: git log test/jdk/java/lang/StringBuilder/ which shows: commit df22fb322e6c4c9931a770bd0abf4c43b83c4e4a Author: Jim Laskey Date: Thu Jan 4 12:46:31 2024 + 8322512: StringBuffer.repeat does not work correctly after toString() was called Reviewed-by: rriggs, jpai commit 9b9b5a7a5c624f3512567f5d9b2e9eec231cabb3 Author: Jim Laskey Date: Mon Apr 3 15:29:21 2023 + 8302323: Add repeat methods to StringBuilder/StringBuffer Reviewed-by: tvaleev, redestad So there's been only 1 commit in that test directory since April 2023. That commit happened on Jan 4th 2024, but at first glance, that change itself doesn't look like something that can cause this issue. The JBS issue you filed is on Jan 30th 2024. Have you noticed such failures with these `test/jdk/java/lang/StringBuilder/` last year? - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1952476504
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. Hello Matthias, > What do you think about marking jtreg tests with higher memory requirements > with a jtreg key like highmemusage ? I still don't have any concrete suggestions - it isn't fully clear to me what we should do here. Part of the reason is because, details like the exact command that's being used to run these tests, the "-concurrency" that's either getting computed or explicitly set, the exact Windows OS version and Windows system configurations like the total memory available, the number of CPUs etc... are all unknown right now. Having those details I think would be good to understand what approach to take here. Those details will also help understand why this isn't observed in our internal CI runs. - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1952463101
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Wed, 31 Jan 2024 08:13:25 GMT, Matthias Baesken wrote: > Can we maybe see if we can fix these tests without exclusive-accessing them? > I find it surprising that `java/lang/StringBuilder` tests are problematic, > but `java/lang/StringBuffer` tests are not. Which tests fail? What do you think about marking jtreg tests with higher memory requirements with a jtreg key like highmemusage ? This way we do not need to put these tests into the _exclusiveAccess.dirs_ group, but get a way (only if needed) to execute those with high memory usage separately e.g. with lower concurrency. - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1951934706
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. I started a discussion on jtreg-dev https://mail.openjdk.org/pipermail/jtreg-dev/2024-February/001926.html but not much response so far. Adding a jtreg test key for tests with higher memory requirement (like HugeCapacity) would probably help to solve these resource issues . - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1943742596
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Mon, 12 Feb 2024 10:47:56 GMT, Jaikiran Pai wrote: > What seems to be happening is that the system where this run appears to be > launching too many tests concurrently. Sure, that's why I want to limit the concurrency *for certain tests/ test groups* . Limiting it for the whole tier1 would slow down tests that are absolutely fine with the concurrency we set. - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1940741219
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. Hello Matthias, looking at the crash log you pasted, it's clear that the test itself isn't a culprit here. Specifically, the failure appears to be when a JVM launch is being attempted for the `test/jdk/java/lang/StringBuilder/Insert.java` test (which looking at the code doesn't use too much memory once launched). What seems to be happening is that the system where this run appears to be launching too many tests concurrently. The exact command used to launch these tests on that setup would be helpful in understanding the configurations. The JDK build by default "computes" the `TEST_JOBS` value which controls this concurrency (the number of jtreg concurrent tests to run) and that's done here https://github.com/openjdk/jdk/blob/master/make/RunTests.gmk#L151 and as noted in testing.md, it is configurable (and has a per system default) https://github.com/openjdk/jdk/blob/master/doc/testing.md#jobs-1. This configuration ultimately translates to the `-concurrency` option of jtreg which is explained in section `3.8 How do I specify whether to run tests concurrently?` and `3.25 My system is unusable while I run tests. How do I fix that?` of the jtreg FAQ https://openjdk.org/jtreg/faq.html. Based on the available details so far, it appears that you might have to reduce the value for this concurrency option, through the right build/test option. - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1938437285
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. Hello, any further comments on this ? Or should we carry the discussion to jtreg on how to work with resource (in this case memory) issues in case of concurrent runs ? Can we **_configure_** to execute some jtreg tests with higher mem requirements with less concurrency ? - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1933560174
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. This is what the thread stack looks like in hs_err for example for java\lang\StringBuilder\Insert\hs_err_pid910208.log we had on Sun Jan 07 20:32:56 CET 2024 such an hs err file with thread stack : # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 536870912 bytes. Error detail: G1 virtual space # Possible reasons: # The system is out of physical RAM or swap space # This process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # JVM is running with Unscaled Compressed Oops mode in which the Java heap is # placed in the first 4GB address space. The Java Heap base address is the # maximum limit for the native heap growth. Please use -XX:HeapBaseMinAddress # to set the Java Heap base and to place the Java Heap above 4GB virtual address. # This output file may be truncated or incomplete. # # Out of Memory Error (c:\openjdk-jdk-dev-windows_x86_64-dbg\jdk\src\hotspot\os\windows\os_windows.cpp:3627), pid=910208, tid=910648 # # JRE version: (23.0) (fastdebug build ) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-adhoc.GLOBALsapmachine.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, windows-amd64) # CreateCoredumpOnCrash turned off, no core file dumped # ... --- T H R E A D --- Current thread (0x02178c5a44c0): JavaThread "Unknown thread" [_thread_in_vm, id=910648, stack(0x00eeec50,0x00eeec60) (1024K)] Stack: [0x00eeec50,0x00eeec60] Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0xc96581] os::win32::platform_print_native_stack+0x101 (os_windows_x86.cpp:236) V [jvm.dll+0xfe7b31] VMError::report+0x1491 (vmError.cpp:1005) V [jvm.dll+0xfea055] VMError::report_and_die+0x645 (vmError.cpp:1834) V [jvm.dll+0xfea7cf] VMError::report_and_die+0x5f (vmError.cpp:1604) V [jvm.dll+0x559d4f] report_vm_out_of_memory+0x5f (debug.cpp:225) V [jvm.dll+0xc91c5d] os::pd_commit_memory_or_exit+0xad (os_windows.cpp:3635) V [jvm.dll+0xc82a2e] os::commit_memory_or_exit+0x6e (os.cpp:2051) V [jvm.dll+0x6de800] G1PageBasedVirtualSpace::commit+0x100 (g1PageBasedVirtualSpace.cpp:192) V [jvm.dll+0x6f0aff] G1RegionsLargerThanCommitSizeMapper::commit_regions+0x7f (g1RegionToSpaceMapper.cpp:100) V [jvm.dll+0x7806da] HeapRegionManager::expand+0x8a (heapRegionManager.cpp:164) V [jvm.dll+0x780be6] HeapRegionManager::expand_by+0xf6 (heapRegionManager.cpp:361) V [jvm.dll+0x6812e4] G1CollectedHeap::expand+0xf4 (g1CollectedHeap.cpp:1014) V [jvm.dll+0x682dc6] G1CollectedHeap::initialize+0x596 (g1CollectedHeap.cpp:1389) V [jvm.dll+0xf823e0] universe_init+0x140 (universe.cpp:794) V [jvm.dll+0x79c8c1] init_globals+0x31 (init.cpp:126) V [jvm.dll+0xf5c20e] Threads::create_vm+0x2ae (threads.cpp:552) V [jvm.dll+0x8c17b2] JNI_CreateJavaVM_inner+0x82 (jni.cpp:3576) V [jvm.dll+0x8c5d9f] JNI_CreateJavaVM+0x1f (jni.cpp:3667) C [jli.dll+0x539f] JavaMain+0x113 (java.c:491) C [ucrtbase.dll+0x2268a] (no source info available) C [KERNEL32.DLL+0x17ac4] (no source info available) C [ntdll.dll+0x5a4e1] (no source info available) - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1927319309
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. Hello Matthias, would you be able to include a stacktrace from one such failure? The tests you mention as failing: java/lang/StringBuilder/StringBufferRepeat.java java/lang/StringBuilder/CompactStringBuilderSerialization.java java/lang/StringBuilder/Insert.java are all "othervm" tests, so I'm curious what kind of OOM is being reported. - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1925764255
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Wed, 31 Jan 2024 00:48:35 GMT, Joe Darcy wrote: > Can we maybe see if we can fix these tests without exclusive-accessing them? > I find it surprising that `java/lang/StringBuilder` tests are problematic, > but `java/lang/StringBuffer` tests are not. Which tests fail? It is a bit arbitrary which tests fail. one day : java/lang/StringBuilder/StringBufferRepeat.java java/lang/StringBuilder/CompactStringBuilderSerialization.java java/lang/StringBuilder/Insert.java other day: java/lang/StringBuilder/HugeCapacity.java next day it might differ a bit. Maybe it would be sufficient to execute only the HugeCapacity test in a non concurrent way because this one seems to be especially resource hungry, but I am not aware how this would work in jtreg (I can only set whole directories). Currently we run with this patch and the issues are gone. Is there a way to balance resource usage in jtreg runs? - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1918595685
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 17:21:07 GMT, Aleksey Shipilev wrote: > Can we maybe see if we can fix these tests without exclusive-accessing them? > I find it surprising that `java/lang/StringBuilder` tests are problematic, > but `java/lang/StringBuffer` tests are not. Which tests fail? I agree it would be strongly preferable to allow these tests to run without exclusive access. - PR Comment: https://git.openjdk.org/jdk/pull/17625#issuecomment-1918163496
Re: RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On Tue, 30 Jan 2024 09:08:28 GMT, Matthias Baesken wrote: > On some Windows machines we see sometimes OOM errors because of high resource > (memory/swap) consumption. This is especially seen when the jtreg runs have > higher concurrency. A solution is to put the java/lang/StringBuilder tests in > the exclusiveAccess.dirs group so that they are not executed concurrently, > which helps to mitigate the resource shortages. > Of course this has the downside that on very large machines the concurrent > execution is not done any more. Can we maybe see if we can fix these tests without exclusive-accessing them? I find it surprising that `java/lang/StringBuilder` tests are problematic, but `java/lang/StringBuffer` tests are not. Which tests fail? - PR Review: https://git.openjdk.org/jdk/pull/17625#pullrequestreview-1851921699
RFR: JDK-8324930: java/lang/StringBuilder problem with concurrent jtreg runs
On some Windows machines we see sometimes OOM errors because of high resource (memory/swap) consumption. This is especially seen when the jtreg runs have higher concurrency. A solution is to put the java/lang/StringBuilder tests in the exclusiveAccess.dirs group so that they are not executed concurrently, which helps to mitigate the resource shortages. Of course this has the downside that on very large machines the concurrent execution is not done any more. - Commit messages: - JDK-8324930 Changes: https://git.openjdk.org/jdk/pull/17625/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17625&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324930 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17625/head:pull/17625 PR: https://git.openjdk.org/jdk/pull/17625