Hi Jeremy,

Please find this revised version of the previous patch. 
The main issue being addressed is a segfault caused by accessing `wrapper_arg` 
on the stack after it had been deallocated by VirtualFree. 
This resulted in invalid memory access during thread startup on AArch64. In 
this version, the thread `func` and `arg` are loaded before the stack is freed, 
stored in callee-saved registers, and restored before calling the thread func.

Commit message has been updated accordingly and wrapped at 72 characters with 
trailer. Thanks for the feedback!

In-Lined patch: 

>From 53215b09e6a19c9493fa5fa58d91a82f6d51e1b2 Mon Sep 17 00:00:00 2001
From: Thirumalai Nagalingam <thirumalai.nagalin...@multicorewareinc.com>
Date: Tue, 1 Jul 2025 18:17:24 +0000
Subject: [PATCH] Cygwin: Aarch64: optimize pthread_wrapper register usage

This patch resolves issues related to unsafe access to deallocated
stack memory in the pthread wrapper for AArch64.

Key changes:
- Removed use of x19 by directly loading the thread function and
  argument using LDP from [WRAPPER_ARG], freeing one register.
- Stored thread function and argument in x20 and x21 before
  VirtualFree to preserve them across calls.
- Used x1 as a temporary register to load the stack base,
  subtract CYGTLS, and update SP.
- Moved the thread argument back into x0 after VirtualFree and
  before calling the thread function.

Earlier, `wrapper_arg` lived on the stack, which was freed via
`VirtualFree`, risking segfaults on later access. Now, the thread
`func` and `arg` are loaded before the stack is freed, stored in
callee-saved registers, and restored to `x0` before calling the
thread function.

Fixes: f4ba145056db ("Aarch64: Add inline assembly pthread wrapper")
Signed-off-by: Thirumalai Nagalingam 
<thirumalai.nagalin...@multicorewareinc.com>
---
 winsup/cygwin/create_posix_thread.cc | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/winsup/cygwin/create_posix_thread.cc 
b/winsup/cygwin/create_posix_thread.cc
index 592aaf1a5..17bb607f7 100644
--- a/winsup/cygwin/create_posix_thread.cc
+++ b/winsup/cygwin/create_posix_thread.cc
@@ -103,18 +103,19 @@ pthread_wrapper (PVOID arg)
   /* Sets up a new thread stack, frees the original OS stack,
    * and calls the thread function with its arg using AArch64 ABI. */
   __asm__ __volatile__ ("\n\
-          mov     x19, %[WRAPPER_ARG]  // x19 = &wrapper_arg              \n\
-          ldp     x0, x10, [x19, #16]  // x0 = stackaddr, x10 = stackbase \n\
-          sub     sp, x10, %[CYGTLS]   // sp = stackbase - (CYGTLS)       \n\
-          mov     fp, xzr              // clear frame pointer (x29)       \n\
-          mov     x1, xzr              // x1 = 0 (dwSize)                 \n\
-          mov     x2, #0x8000          // x2 = MEM_RELEASE                \n\
-          bl      VirtualFree          // free original stack             \n\
-          ldp     x19, x0, [x19]       // x19 = func, x0 = arg            \n\
-          blr     x19                  // call thread function            \n"
+          ldp     x20, x21, [%[WRAPPER_ARG]]    // x20 = thread func, x21 = 
thread arg \n\
+          ldp     x0, x1, [%[WRAPPER_ARG], #16] // x0 = stackaddr, x1 = 
stackbase      \n\
+          sub     sp, x1, %[CYGTLS]             // sp = stackbase - (CYGTLS)   
        \n\
+          mov     fp, xzr                       // clear frame pointer (x29)   
        \n\
+                                                // x0 already has stackaddr    
        \n\
+          mov     x1, xzr                       // x1 = 0 (dwSize)             
        \n\
+          mov     x2, #0x8000                   // x2 = MEM_RELEASE            
        \n\
+          bl      VirtualFree                   // free original stack         
        \n\
+          mov     x0, x21                       // Move arg into x0            
        \n\
+          blr     x20                           // call thread function        
        \n"
           : : [WRAPPER_ARG] "r" (&wrapper_arg),
               [CYGTLS] "r" (__CYGTLS_PADSIZE__)
-          : "x0", "x1", "x2", "x10", "x19", "x29", "memory");
+          : "x0", "x1", "x2", "x20", "x21", "x29", "memory");
 #else
 #error unimplemented for this target
 #endif
-- 
2.49.0

Best regards,
Thirumalai Nagalingam

-----Original Message-----
From: Jeremy Drake <cyg...@jdrake.com> 
Sent: 01 July 2025 22:53
To: Thirumalai Nagalingam <thirumalai.nagalin...@multicorewareinc.com>
Cc: cygwin-patches@cygwin.com
Subject: Re: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 and 
streamlining register usage

On Tue, 1 Jul 2025, Thirumalai Nagalingam wrote:

> Hi all,
>
> This patch fixes existing issues in my earlier commit 
> [https://github.com/cygwin/cygwin/commit/f4ba145056dbe99adf4dbe632bec0
> 35e006539f8] and optimizes the AArch64 thread startup sequence by 
> eliminating the use of register x19 and streamlining register usage. 
> The key modifications are detailed in the patch's commit description. 
> These changes improve register efficiency while ensuring correct 
> thread argument in register `x0` after virtual free call, preventing 
> from any segmentation faults.
> The patch has been tested in our internal AArch64 environment where 
> pthread related test cases are now passing as expected.
>
> Inlined Patch:
>
> From e197e39452e542d18812f41ac2a3af2fa172b273 Mon Sep 17 00:00:00 2001
> From: Thirumalai Nagalingam 
> <thirumalai.nagalin...@multicorewareinc.com>
> Date: Tue, 1 Jul 2025 14:46:52 +0530
> Subject: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 
> and  streamlining register usage
>
> - Removed use of x19 by directly loading the thread func and arg using 
> LDP from [WRAPPER_ARG], freeing up one additional register
> - Loaded thread function and argument into x20 and x21 before 
> VirtualFree to preserve their values across the virtual free call
> - Used x1 as a temporary register to load stack base, subtract CYGTLS, 
> and update SP
> - Moved thread argument back into x0 after VirtualFree and before 
> calling the thread function
>

So the problem was that the registers used before were ones not required to be 
preserved across function calls in the ABI?  Or was it that wrapper_arg was on 
the now-freed stack so could not be accessed after the VirtualFree?  Pleas 
include that in your commit message.  Also, please wrap your commit message at 
72 characters, prefix the subject/first line
"Cygwin: ", and include the trailer

Fixes: f4ba145056db ("Aarch64: Add inline assembly pthread wrapper")
Signed-off-by: Thirumalai Nagalingam 
<thirumalai.nagalin...@multicorewareinc.com>

Attachment: 0001-Cygwin-Aarch64-optimize-pthread_wrapper-register-usa.patch
Description: 0001-Cygwin-Aarch64-optimize-pthread_wrapper-register-usa.patch

Reply via email to