Hi Jon,

Thanks for reviewing.

On Wed, 24 Dec 2025 15:29:23 +0000
Jon Turney <[email protected]> wrote:

> On 23/12/2025 18:41, Takashi Yano wrote:
> > The test case winsup/testsuites/winsup.api/pthread/cancel2 fails
> > on Windows 11 and Windows Server 2025, while it works on Windows 10
> > and Windows Server 2022. PTHREAD_CANCEL_ASYNCHRONOUS is implemented
> 
> Awesome piece of detective work tracking this down! Well done!
> 
> > using [GS]etThreadContext() on the target thread and forcibly
> > overrides instruction pointer to pthread::static_cancel_self().
> > Previously, the stack pointer was not trimmed to 16-byte alignment,
> > even though this is required by 64-bit Windows ABI. This appears to
> > have been overlooked when cygwin first added x86_64 support.
> > 
> > This patch fixes this issue by aligning the stack pointer as well as
> > the instruction pointer in the PTHREAD_CANCEL_ASYNCHRONOUS handling.
> 
> To restate the problem a bit more generally:
> 
> * PTHREAD_CANCEL_ASYNCHRONOUS is implemented by forcing the target 
> thread's IP to pthread::static_cancel_self().
> 
> * static_cancel_self() may (will) call Windows API functions during 
> thread shutdown. A misaligned stack will lead to unexpected exceptions.
> 
> * At the start of the function prologue the stack is expected to be at 
> an offset of 8 from a 16-byte boundary (IP % 16 == 8), as the call 
> instruction has just pushed the return IP onto the stack.
> 
> * Therefore, we must also adjust the stack pointer like that to ensure 
> that stack alignment is correct at the end of the function prologue.
> 
> > Addresses: https://cygwin.com/pipermail/cygwin/2025-December/259117.html
> > Fixes: 61522196c715 ("* Merge in cygwin-64bit-branch.")
> 
> Given all that, it's kind of surprising that this ever worked at all!
> 
> > Reported-by: Takashi Yano <[email protected]>
> > Reviewed-by:
> > Signed-off-by: Takashi Yano <[email protected]>
> > ---
> >   winsup/cygwin/thread.cc | 19 +++++++++++++++++++
> >   1 file changed, 19 insertions(+)
> > 
> > diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> > index 86a00e76e..2270a248b 100644
> > --- a/winsup/cygwin/thread.cc
> > +++ b/winsup/cygwin/thread.cc
> > @@ -630,6 +630,25 @@ pthread::cancel ()
> >         threadlist_t *tl_entry = cygheap->find_tls (cygtls);
> >         if (!cygtls->inside_kernel (&context))
> >     {
> > +#if defined(__x86_64__)
> > +     /* Need to trim alignment of stack pointer.
> > +        
> > https://learn.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170
> > +        states,
> > +          "The stack will always be maintained 16-byte aligned,
> > +           except within the prolog (for example, after the return
> > +           address is pushed),",
> > +        that is, we need 16n + 8 byte alignment here. */
> 
> Maybe this should say something like "we don't fully emulate a call 
> instruction, by pushing the current IP onto the stack. But we need to 
> fake the SP adjustment that does, in order for SP to be correctly 
> aligned at the end of the function prologue"?
> 
> Returning from pthread::static_cancel_self() mustn't happen because we 
> haven't fully synthesized the call instruction by storing the return 
> address on the stack (and maybe other things). This is memorialized by 
> the no_return function attribute.
> 
> > +     context._CX_stackPtr &= 0xfffffffffffffff8UL;
> > +     if ((context._CX_stackPtr & 8) == 0)
> > +       context._CX_stackPtr -= 8;
> > +#elif defined(__aarch64__)
> > +     /* 16 bytes alignment required. Trim stack pointer just in case.
> > +     
> > https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170
> > +     */
> > +     context._CX_stackPtr &= 0xfffffffffffffff0UL;
> 
> I kind of like (~0x0FUL) as it saves counting if that's the right number 
> of 'F's :)
> 
> > +#else
> > +#error unimplemented for this target
> > +#endif
> >       context._CX_instPtr = (ULONG_PTR) pthread::static_cancel_self;
> >       SetThreadContext (win32_obj_id, &context);
> >     }

I pushed the patch after the changes you suggested. Thanks!

> Some future work thoughts:
> 
> * It seems like the force_align_arg_pointer function attribute maybe 
> instructs the compiler to do the appropriate re-alignment, but opinion 
> seems mixed as to if it works correctly.
> 
> * I don't think there are any other, similar uses of SetThreadContext, 
> but if there are, maybe they need similar treatment.

SetThreadContext() is used in also signal handling (callign sigdelayed).
However, the stack is maintained in the assembly code of sigdelayed.

> * As a thought experiment, I'm not sure what the potential for double 
> frees or similar bad things happening, if the target thread is in the 
> middle of exiting already. I think thread class has some guards against 
> that kind of thing?

Maybe.

-- 
Takashi Yano <[email protected]>

Reply via email to