On Wed, 4 Feb 2026 07:19:03 GMT, Thomas Stuefe <[email protected]> wrote:

> Still Draft, pls ignore for now. Patch is not done yet.
> 
> This patch enables hs-err file generation for native out-of-stack cases. It 
> is an optional analysis feature one can use when JVMs mysteriously vanish - 
> typically, vanishing JVMs are either native stack overflows or OOM kills.
> 
> This was motivated by the analysis difficulties of bugs like 
> https://bugs.openjdk.org/browse/JDK-8371630. There are many more examples.
> 
> ### Motivation
> 
> Today, when native stack overflows, the JVM dies immediately without an 
> hs-err file. This is because C++-compiled code does not bang - if the stack 
> is too small, we walk right into whatever caps the stack. That might be our 
> own yellow/red guard pages, native guard pages placed by libc or kernel, or 
> possibly unmapped area after the end of the stack. 
> 
> Since we don't have a stack left to run the signal handler on, we cannot 
> produce the hs-err file. If one is very lucky, the libc writes a short "Stack 
> overflow" to stderr. But usually not: if it is a JavaThread and we run into 
> our own yellow/red pages, it counts as a simple segmentation fault from the 
> OS's point of view, since the fault address is inside of what it thinks is a 
> valid pthread stack. So, typically, you just see "Segmentation fault" on 
> stderr.
> 
> ***Why do we need this patch? Don't we bang enough space for native code we 
> call?***
> 
> We bang when entering a native function from Java. The maximum stack size we 
> assume at that time might not be enough; moreover, the native code may be 
> buggy or just too deeply or infinitely recursive. 
> 
> ***We could just increase `ShadowPages`, right?***
> 
> Sure, but the point is we have no hs-err file, so we don't even know it was a 
> stack overflow. One would have to start debugging, which is work-intensive 
> and may not even be possible in a customer scenario. And for buggy recursive 
> code, any `ShadowPages` value might be too small. The code would need to be 
> fixed.
> 
> ### Implementation
> 
> The patch uses alternative signal stacks. That is a simple, robust solution 
> with few moving parts. It works out of the box for all cases: 
> - Stack overflows inside native JNI code from Java 
> - Stack overflows inside Hotspot-internal JavaThread children (e.g. 
> CompilerThread, AttachListenerThread etc)
> - Stack overflows in non-Java threads (e.g. VMThread, ConcurrentGCThread)
> - Stack overflows in outside threads that are attached to the JVM, e.g. 
> third-party JVMTI threads
> 
> The drawback of this simplicity is that it is not suitable for always-on 
> production use. That is du...

src/hotspot/os/posix/threadAltSigStack_posix.cpp line 121:

> 119:   if (success) {
> 120:     step ++;
> 121:     DEBUG_ONLY(memset(p, 0, stacksize));

This line can be removed because `mmap()` seems to be called with 
`MAP_ANONYMOUS` on all POSIX platforms. The memory region is initialized to 
zero.
https://man7.org/linux/man-pages/man2/mmap.2.html

src/hotspot/share/runtime/globals.hpp line 2013:

> 2011:           "Enable the use of alternative signal stacks.")               
>     \
> 2012:                                                                         
>     \
> 2013:   product(intx, AltSigStackSize, 128, DIAGNOSTIC,                       
>     \

Is it better to use `size_t` rather than `intx`?

src/hotspot/share/utilities/vmError.cpp line 2186:

> 2184: } // end: crash_with_segfault
> 2185: PRAGMA_DIAG_POP
> 2186: 

The code would be more simple if we can use `alloca()` here.

test/hotspot/jtreg/runtime/ErrorHandling/libNativeStackOverflow.c line 36:

> 34: #elif defined (__xlC__)
> 35:   #pragma option_override(function_name, "opt(level, 0)")
> 36: #endif

Warning for recursive call do not need to be disabled if the test uses 
`alloca()`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2887048917
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2887062317
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2887097249
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2887113569

Reply via email to