Hi Takashi,
On Jun 23 20:51, Takashi Yano wrote:
> After the commit f305ca916ad2, some stress-ng tests fail in arm64
> windows. There seems to be two causes for this issue. One is that
> calling SuspendThread(GetCurrentThread()) may suspend myself in
> the kernel. Branching to sigdelayed in the kernel code does not
> work as expected as the original _cygtls::interrup_now() intended.
> The other cause is, single step exception sometimes does not trigger
> exception::handle() for some reason. Therefore, register vectored
> exception handler (VEH) and use it for single step exception instead.
>
> Addresses: https://cygwin.com/pipermail/cygwin/2025-June/258332.html
> Fixes: f305ca916ad2 ("Cygwin: signal: Prevent unexpected crash on frequent
> SIGSEGV")
> Reported-by: Jeremy Drake <[email protected]>
> Reviewed-by:
> Signed-off-by: Takashi Yano <[email protected]>
> ---
> winsup/cygwin/exceptions.cc | 46 +++++++++++++++------------
> winsup/cygwin/local_includes/cygtls.h | 1 +
> 2 files changed, 27 insertions(+), 20 deletions(-)
>
> diff --git a/winsup/cygwin/exceptions.cc b/winsup/cygwin/exceptions.cc
> index a4699b172..e5193551b 100644
> --- a/winsup/cygwin/exceptions.cc
> +++ b/winsup/cygwin/exceptions.cc
> @@ -653,13 +653,6 @@ exception::handle (EXCEPTION_RECORD *e, exception_list
> *frame, CONTEXT *in,
> static int NO_COPY debugging = 0;
> _cygtls& me = _my_tls;
>
> - if (me.suspend_on_exception)
> - {
> - SuspendThread (GetCurrentThread ());
> - if (e->ExceptionCode == (DWORD) STATUS_SINGLE_STEP)
> - return ExceptionContinueExecution;
> - }
> -
> if (debugging && ++debugging < 500000)
> {
> SetThreadPriority (hMainThread, THREAD_PRIORITY_NORMAL);
> @@ -923,6 +916,22 @@ sig_handle_tty_stop (int sig, siginfo_t *, void *)
> }
> } /* end extern "C" */
>
> +static HANDLE h_veh = NULL;
> +static LONG CALLBACK
> +veh (EXCEPTION_POINTERS *ep)
A better name would be nice. Something like singlestep_handler or something
more appropriate.
> +{
> + if (_my_tls.suspend_on_exception)
> + {
> + _my_tls.in_exception_handler = true;
> + while (_my_tls.suspend_on_exception) ; /* Don't call yield() to privent
^^^^^^^
prevent
> + the thread form being suspended
> + in the kernel. */
> + if (ep->ExceptionRecord->ExceptionCode == (DWORD) STATUS_SINGLE_STEP)
> + return EXCEPTION_CONTINUE_EXECUTION;
> + }
> + return EXCEPTION_CONTINUE_SEARCH;
> +}
> +
> bool
> _cygtls::interrupt_now (CONTEXT *cx, siginfo_t& si, void *handler,
> struct sigaction& siga)
> @@ -943,25 +952,22 @@ _cygtls::interrupt_now (CONTEXT *cx, siginfo_t& si,
> void *handler,
> by setting the trap flag (TF) before calling ResumeThread(). This
> will trigger either STATUS_SINGLE_STEP or the exception caused by
> the instruction that Rip originally pointed to. By suspending the
> - targeted thread within exception::handle(), Rip no longer points
> - to the problematic instruction, allowing safe handling of the
> - interrupt. As a result, Rip can be adjusted appropriately, and the
> - thread can resume execution without unexpected crashes. */
> + targeted thread within the vectored exception handler veh(), Rip no
> + longer points to the problematic instruction, allowing safe handling
> + of the interrupt. As a result, Rip can be adjusted appropriately,
> + and the thread can resume execution without unexpected crashes. */
> if (!inside_kernel (cx, true))
> {
> + if (h_veh == NULL)
> + h_veh = AddVectoredExceptionHandler (1, veh);
h_veh is static, but not NO_COPY. I'm pretty sure this might crash
a subsequently forked child.
IMO it would make more sense to make h_veh a local var and call
RemoveVectoredExceptionHandler after calling SuspendThread.
> cx->EFlags |= 0x100; /* Set TF (setup single step execution) */
> SetThreadContext (*this, cx);
> suspend_on_exception = true;
> + in_exception_handler = false;
> ResumeThread (*this);
> - ULONG cnt = 0;
> - NTSTATUS status;
> - do
> - {
> - yield ();
> - status = NtQueryInformationThread (*this, ThreadSuspendCount,
> - &cnt, sizeof (cnt), NULL);
> - }
> - while (NT_SUCCESS (status) && cnt == 0);
> + while (!in_exception_handler)
> + yield ();
We're yielding a lot, which may waste CPU time. Rather than yielding,
we should consider to use WaitOnAddress/WakeByAddress* more often.
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress
But there's another problem I don't get. The VEH apparently
runs in the context of the single stepping thread (you're using
_my_tls in the VEH). It sets in_exception_handler to true and then
goes into a busy loop before returning the exception flag.
But that means the following SuspendThread...
> + SuspendThread (*this);
...will suspend the thread while in the VEH...
> GetThreadContext (*this, cx);
> suspend_on_exception = false;
...because suspend_on_exception is true up to here.
How is that supposed to work?
Thanks,
Corinna