On 23/12/2025 18:41, Takashi Yano wrote:
The test case winsup/testsuites/winsup.api/pthread/cancel2 fails
on Windows 11 and Windows Server 2025, while it works on Windows 10
and Windows Server 2022. PTHREAD_CANCEL_ASYNCHRONOUS is implemented

Awesome piece of detective work tracking this down! Well done!

using [GS]etThreadContext() on the target thread and forcibly
overrides instruction pointer to pthread::static_cancel_self().
Previously, the stack pointer was not trimmed to 16-byte alignment,
even though this is required by 64-bit Windows ABI. This appears to
have been overlooked when cygwin first added x86_64 support.

This patch fixes this issue by aligning the stack pointer as well as
the instruction pointer in the PTHREAD_CANCEL_ASYNCHRONOUS handling.

To restate the problem a bit more generally:

* PTHREAD_CANCEL_ASYNCHRONOUS is implemented by forcing the target thread's IP to pthread::static_cancel_self().

* static_cancel_self() may (will) call Windows API functions during thread shutdown. A misaligned stack will lead to unexpected exceptions.

* At the start of the function prologue the stack is expected to be at an offset of 8 from a 16-byte boundary (IP % 16 == 8), as the call instruction has just pushed the return IP onto the stack.

* Therefore, we must also adjust the stack pointer like that to ensure that stack alignment is correct at the end of the function prologue.

Addresses: https://cygwin.com/pipermail/cygwin/2025-December/259117.html
Fixes: 61522196c715 ("* Merge in cygwin-64bit-branch.")

Given all that, it's kind of surprising that this ever worked at all!

Reported-by: Takashi Yano <[email protected]>
Reviewed-by:
Signed-off-by: Takashi Yano <[email protected]>
---
  winsup/cygwin/thread.cc | 19 +++++++++++++++++++
  1 file changed, 19 insertions(+)

diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
index 86a00e76e..2270a248b 100644
--- a/winsup/cygwin/thread.cc
+++ b/winsup/cygwin/thread.cc
@@ -630,6 +630,25 @@ pthread::cancel ()
        threadlist_t *tl_entry = cygheap->find_tls (cygtls);
        if (!cygtls->inside_kernel (&context))
        {
+#if defined(__x86_64__)
+         /* Need to trim alignment of stack pointer.
+            
https://learn.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170
+            states,
+              "The stack will always be maintained 16-byte aligned,
+               except within the prolog (for example, after the return
+               address is pushed),",
+            that is, we need 16n + 8 byte alignment here. */

Maybe this should say something like "we don't fully emulate a call instruction, by pushing the current IP onto the stack. But we need to fake the SP adjustment that does, in order for SP to be correctly aligned at the end of the function prologue"?

Returning from pthread::static_cancel_self() mustn't happen because we haven't fully synthesized the call instruction by storing the return address on the stack (and maybe other things). This is memorialized by the no_return function attribute.

+         context._CX_stackPtr &= 0xfffffffffffffff8UL;
+         if ((context._CX_stackPtr & 8) == 0)
+           context._CX_stackPtr -= 8;
+#elif defined(__aarch64__)
+         /* 16 bytes alignment required. Trim stack pointer just in case.
+         
https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170
+         */
+         context._CX_stackPtr &= 0xfffffffffffffff0UL;

I kind of like (~0x0FUL) as it saves counting if that's the right number of 'F's :)

+#else
+#error unimplemented for this target
+#endif
          context._CX_instPtr = (ULONG_PTR) pthread::static_cancel_self;
          SetThreadContext (win32_obj_id, &context);
        }


Some future work thoughts:

* It seems like the force_align_arg_pointer function attribute maybe instructs the compiler to do the appropriate re-alignment, but opinion seems mixed as to if it works correctly.

* I don't think there are any other, similar uses of SetThreadContext, but if there are, maybe they need similar treatment.

* As a thought experiment, I'm not sure what the potential for double frees or similar bad things happening, if the target thread is in the middle of exiting already. I think thread class has some guards against that kind of thing?

Reply via email to