Hi Jon, Thanks for reviewing.
On Wed, 24 Dec 2025 15:29:23 +0000 Jon Turney <[email protected]> wrote: > On 23/12/2025 18:41, Takashi Yano wrote: > > The test case winsup/testsuites/winsup.api/pthread/cancel2 fails > > on Windows 11 and Windows Server 2025, while it works on Windows 10 > > and Windows Server 2022. PTHREAD_CANCEL_ASYNCHRONOUS is implemented > > Awesome piece of detective work tracking this down! Well done! > > > using [GS]etThreadContext() on the target thread and forcibly > > overrides instruction pointer to pthread::static_cancel_self(). > > Previously, the stack pointer was not trimmed to 16-byte alignment, > > even though this is required by 64-bit Windows ABI. This appears to > > have been overlooked when cygwin first added x86_64 support. > > > > This patch fixes this issue by aligning the stack pointer as well as > > the instruction pointer in the PTHREAD_CANCEL_ASYNCHRONOUS handling. > > To restate the problem a bit more generally: > > * PTHREAD_CANCEL_ASYNCHRONOUS is implemented by forcing the target > thread's IP to pthread::static_cancel_self(). > > * static_cancel_self() may (will) call Windows API functions during > thread shutdown. A misaligned stack will lead to unexpected exceptions. > > * At the start of the function prologue the stack is expected to be at > an offset of 8 from a 16-byte boundary (IP % 16 == 8), as the call > instruction has just pushed the return IP onto the stack. > > * Therefore, we must also adjust the stack pointer like that to ensure > that stack alignment is correct at the end of the function prologue. > > > Addresses: https://cygwin.com/pipermail/cygwin/2025-December/259117.html > > Fixes: 61522196c715 ("* Merge in cygwin-64bit-branch.") > > Given all that, it's kind of surprising that this ever worked at all! > > > Reported-by: Takashi Yano <[email protected]> > > Reviewed-by: > > Signed-off-by: Takashi Yano <[email protected]> > > --- > > winsup/cygwin/thread.cc | 19 +++++++++++++++++++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc > > index 86a00e76e..2270a248b 100644 > > --- a/winsup/cygwin/thread.cc > > +++ b/winsup/cygwin/thread.cc > > @@ -630,6 +630,25 @@ pthread::cancel () > > threadlist_t *tl_entry = cygheap->find_tls (cygtls); > > if (!cygtls->inside_kernel (&context)) > > { > > +#if defined(__x86_64__) > > + /* Need to trim alignment of stack pointer. > > + > > https://learn.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170 > > + states, > > + "The stack will always be maintained 16-byte aligned, > > + except within the prolog (for example, after the return > > + address is pushed),", > > + that is, we need 16n + 8 byte alignment here. */ > > Maybe this should say something like "we don't fully emulate a call > instruction, by pushing the current IP onto the stack. But we need to > fake the SP adjustment that does, in order for SP to be correctly > aligned at the end of the function prologue"? > > Returning from pthread::static_cancel_self() mustn't happen because we > haven't fully synthesized the call instruction by storing the return > address on the stack (and maybe other things). This is memorialized by > the no_return function attribute. > > > + context._CX_stackPtr &= 0xfffffffffffffff8UL; > > + if ((context._CX_stackPtr & 8) == 0) > > + context._CX_stackPtr -= 8; > > +#elif defined(__aarch64__) > > + /* 16 bytes alignment required. Trim stack pointer just in case. > > + > > https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170 > > + */ > > + context._CX_stackPtr &= 0xfffffffffffffff0UL; > > I kind of like (~0x0FUL) as it saves counting if that's the right number > of 'F's :) > > > +#else > > +#error unimplemented for this target > > +#endif > > context._CX_instPtr = (ULONG_PTR) pthread::static_cancel_self; > > SetThreadContext (win32_obj_id, &context); > > } I pushed the patch after the changes you suggested. Thanks! > Some future work thoughts: > > * It seems like the force_align_arg_pointer function attribute maybe > instructs the compiler to do the appropriate re-alignment, but opinion > seems mixed as to if it works correctly. > > * I don't think there are any other, similar uses of SetThreadContext, > but if there are, maybe they need similar treatment. SetThreadContext() is used in also signal handling (callign sigdelayed). However, the stack is maintained in the assembly code of sigdelayed. > * As a thought experiment, I'm not sure what the potential for double > frees or similar bad things happening, if the target thread is in the > middle of exiting already. I think thread class has some guards against > that kind of thing? Maybe. -- Takashi Yano <[email protected]>
