On Mon, 22 Dec 2025 16:46:53 +0100
Corinna Vinschen wrote:
> On Dec 22 23:37, Takashi Yano via Cygwin wrote:
> > Alignment issue?
> >
> > This might be the right thing.
> >
> > diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> > index 86a00e76e..ec1e3c98c 100644
> > --- a/winsup/cygwin/thread.cc
> > +++ b/winsup/cygwin/thread.cc
> > @@ -630,6 +630,8 @@ pthread::cancel ()
> > threadlist_t *tl_entry = cygheap->find_tls (cygtls);
> > if (!cygtls->inside_kernel (&context))
> > {
> > + if ((context._CX_stackPtr & 8) == 0)
> > + context._CX_stackPtr -= 8;
>
> Does that really help? Checking for 8 byte alignment is usually done
> with (X & 7) != 0, because this won't catch 16 byte aligned stacks.
This code does not aim for 8 byte alignment, but 16n + 8. I assume
context._CX_stackPtr & 7 is always 0. I wonder if this assumption
is true. What if user code pushes 16 bit register such as AX?
It might be necessary to mask least 3 bits in advance.
diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
index 86a00e76e..628aef16f 100644
--- a/winsup/cygwin/thread.cc
+++ b/winsup/cygwin/thread.cc
@@ -630,6 +630,9 @@ pthread::cancel ()
threadlist_t *tl_entry = cygheap->find_tls (cygtls);
if (!cygtls->inside_kernel (&context))
{
+ context._CX_stackPtr &= 0xfffffffffffffff8UL;
+ if ((context._CX_stackPtr & 8) == 0)
+ context._CX_stackPtr -= 8;
context._CX_instPtr = (ULONG_PTR) pthread::static_cancel_self;
SetThreadContext (win32_obj_id, &context);
}
> But afaic the stack is always 8 byte aligned anyway. However, there are
> some scenarios where 16 byte alignment is required, as for context
> itself when calling RtlCaptureContext. Maybe that's the problem here?
I think so. x86_64 ABI in Windows requires 16 byte alignment.
https://learn.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170
says:
The stack will always be maintained 16-byte aligned, except
within the prolog (for example, after the return address is pushed),
Therefore, stack alignment here must be 16n + 8 byte alignment.
Because 'call' instruction pushes the RIP (8 byte) into stack,
while the code
context._CX_instPtr = (ULONG_PTR) pthread::static_cancel_self;
does not do that.
> But the context Stackptr is the stackpointer of the current function the
> target thread is running in. The instruction pointer is set to
> pthread::static_cancel_self(), which doesn't get any arguments and doesn't
> use any content from the stack.
Yeah, that was my question.
> It might be a good idea to make sure the stack is always 16 byte
> aligned, but I don't see why pthread::static_cancel_self() ->
> pthread::cancel_self() -> pthread::exit() would require other than 8
> byte alignment.
pthread::exit() calls _cygtls::remove(), and it calls CloseHandle(),
It appears that, from a certain point, CloseHandle() stopped working
unless it was 16n + 8 byte aligned.
> Apparently something in pthread::exit() crashes? But where? Does
> adding debug_printf's help to figure that out?
It crashes in CloseHandle(). debug_printf() also crashes.
#0 0x00007ffa5bea998b in ntdll!SbSelectProcedure ()
from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#1 0x00007ffa594a1ee5 in KERNELBASE!CloseHandle ()
from /cygdrive/c/WINDOWS/System32/KERNELBASE.dll
#2 0x00007ff9e68858ef in _cygtls::remove (this=0x7ffdfce00, wait=4294967295)
at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/cygtls.cc:121
#3 0x00007ff9e6885e88 in _cygtls::remove (this=<optimized out>,
wait=<optimized out>)
at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/cygtls.cc:153
#4 0x00007ff9e68e3803 in pthread::exit (this=0xa00003750,
value_ptr=0xffffffffffffffff)
at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/thread.cc:583
#5 0x00007ff9e68e38d4 in pthread::cancel_self (this=0x4)
at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/thread.cc:1061
#6 0x00007ff9e68e3939 in pthread::static_cancel_self ()
at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/thread.cc:986
#7 0x0000000000000000 in ?? ()
and crashes at:
Dump of assembler code for function ntdll!SbSelectProcedure:
0x00007ffa5bea9820 <+0>: mov %rbx,0x8(%rsp)
0x00007ffa5bea9825 <+5>: mov %rsi,0x10(%rsp)
0x00007ffa5bea982a <+10>: mov %rdi,0x20(%rsp)
0x00007ffa5bea982f <+15>: push %rbp
0x00007ffa5bea9830 <+16>: push %r12
0x00007ffa5bea9832 <+18>: push %r13
0x00007ffa5bea9834 <+20>: push %r14
0x00007ffa5bea9836 <+22>: push %r15
0x00007ffa5bea9838 <+24>: lea -0x1b0(%rsp),%rbp
0x00007ffa5bea9840 <+32>: sub $0x2b0,%rsp
....
=> 0x00007ffa5bea998b <+363>: movaps %xmm0,0x170(%rbp)
0x00007ffa5bea9992 <+370>: movaps %xmm0,0x180(%rbp)
0x00007ffa5bea9999 <+377>: movaps %xmm0,0x190(%rbp)
This means that RBP is not aligned to 16 byte. If the RSP is aligned
to 16n + 8 byte at the begining of the SbSelectProcedure(),
RSP - 8*5 (rbp, r12, r13, r14, r15) - 0x1b0 is not alignd to
16 byte, that is, RSP is not aligned to 16n + 8 byte.
--
Takashi Yano <[email protected]>
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple