Hi Jeremy, Please find this revised version of the previous patch. The main issue being addressed is a segfault caused by accessing `wrapper_arg` on the stack after it had been deallocated by VirtualFree. This resulted in invalid memory access during thread startup on AArch64. In this version, the thread `func` and `arg` are loaded before the stack is freed, stored in callee-saved registers, and restored before calling the thread func.
Commit message has been updated accordingly and wrapped at 72 characters with trailer. Thanks for the feedback! In-Lined patch: >From 53215b09e6a19c9493fa5fa58d91a82f6d51e1b2 Mon Sep 17 00:00:00 2001 From: Thirumalai Nagalingam <thirumalai.nagalin...@multicorewareinc.com> Date: Tue, 1 Jul 2025 18:17:24 +0000 Subject: [PATCH] Cygwin: Aarch64: optimize pthread_wrapper register usage This patch resolves issues related to unsafe access to deallocated stack memory in the pthread wrapper for AArch64. Key changes: - Removed use of x19 by directly loading the thread function and argument using LDP from [WRAPPER_ARG], freeing one register. - Stored thread function and argument in x20 and x21 before VirtualFree to preserve them across calls. - Used x1 as a temporary register to load the stack base, subtract CYGTLS, and update SP. - Moved the thread argument back into x0 after VirtualFree and before calling the thread function. Earlier, `wrapper_arg` lived on the stack, which was freed via `VirtualFree`, risking segfaults on later access. Now, the thread `func` and `arg` are loaded before the stack is freed, stored in callee-saved registers, and restored to `x0` before calling the thread function. Fixes: f4ba145056db ("Aarch64: Add inline assembly pthread wrapper") Signed-off-by: Thirumalai Nagalingam <thirumalai.nagalin...@multicorewareinc.com> --- winsup/cygwin/create_posix_thread.cc | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/winsup/cygwin/create_posix_thread.cc b/winsup/cygwin/create_posix_thread.cc index 592aaf1a5..17bb607f7 100644 --- a/winsup/cygwin/create_posix_thread.cc +++ b/winsup/cygwin/create_posix_thread.cc @@ -103,18 +103,19 @@ pthread_wrapper (PVOID arg) /* Sets up a new thread stack, frees the original OS stack, * and calls the thread function with its arg using AArch64 ABI. */ __asm__ __volatile__ ("\n\ - mov x19, %[WRAPPER_ARG] // x19 = &wrapper_arg \n\ - ldp x0, x10, [x19, #16] // x0 = stackaddr, x10 = stackbase \n\ - sub sp, x10, %[CYGTLS] // sp = stackbase - (CYGTLS) \n\ - mov fp, xzr // clear frame pointer (x29) \n\ - mov x1, xzr // x1 = 0 (dwSize) \n\ - mov x2, #0x8000 // x2 = MEM_RELEASE \n\ - bl VirtualFree // free original stack \n\ - ldp x19, x0, [x19] // x19 = func, x0 = arg \n\ - blr x19 // call thread function \n" + ldp x20, x21, [%[WRAPPER_ARG]] // x20 = thread func, x21 = thread arg \n\ + ldp x0, x1, [%[WRAPPER_ARG], #16] // x0 = stackaddr, x1 = stackbase \n\ + sub sp, x1, %[CYGTLS] // sp = stackbase - (CYGTLS) \n\ + mov fp, xzr // clear frame pointer (x29) \n\ + // x0 already has stackaddr \n\ + mov x1, xzr // x1 = 0 (dwSize) \n\ + mov x2, #0x8000 // x2 = MEM_RELEASE \n\ + bl VirtualFree // free original stack \n\ + mov x0, x21 // Move arg into x0 \n\ + blr x20 // call thread function \n" : : [WRAPPER_ARG] "r" (&wrapper_arg), [CYGTLS] "r" (__CYGTLS_PADSIZE__) - : "x0", "x1", "x2", "x10", "x19", "x29", "memory"); + : "x0", "x1", "x2", "x20", "x21", "x29", "memory"); #else #error unimplemented for this target #endif -- 2.49.0 Best regards, Thirumalai Nagalingam -----Original Message----- From: Jeremy Drake <cyg...@jdrake.com> Sent: 01 July 2025 22:53 To: Thirumalai Nagalingam <thirumalai.nagalin...@multicorewareinc.com> Cc: cygwin-patches@cygwin.com Subject: Re: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 and streamlining register usage On Tue, 1 Jul 2025, Thirumalai Nagalingam wrote: > Hi all, > > This patch fixes existing issues in my earlier commit > [https://github.com/cygwin/cygwin/commit/f4ba145056dbe99adf4dbe632bec0 > 35e006539f8] and optimizes the AArch64 thread startup sequence by > eliminating the use of register x19 and streamlining register usage. > The key modifications are detailed in the patch's commit description. > These changes improve register efficiency while ensuring correct > thread argument in register `x0` after virtual free call, preventing > from any segmentation faults. > The patch has been tested in our internal AArch64 environment where > pthread related test cases are now passing as expected. > > Inlined Patch: > > From e197e39452e542d18812f41ac2a3af2fa172b273 Mon Sep 17 00:00:00 2001 > From: Thirumalai Nagalingam > <thirumalai.nagalin...@multicorewareinc.com> > Date: Tue, 1 Jul 2025 14:46:52 +0530 > Subject: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 > and streamlining register usage > > - Removed use of x19 by directly loading the thread func and arg using > LDP from [WRAPPER_ARG], freeing up one additional register > - Loaded thread function and argument into x20 and x21 before > VirtualFree to preserve their values across the virtual free call > - Used x1 as a temporary register to load stack base, subtract CYGTLS, > and update SP > - Moved thread argument back into x0 after VirtualFree and before > calling the thread function > So the problem was that the registers used before were ones not required to be preserved across function calls in the ABI? Or was it that wrapper_arg was on the now-freed stack so could not be accessed after the VirtualFree? Pleas include that in your commit message. Also, please wrap your commit message at 72 characters, prefix the subject/first line "Cygwin: ", and include the trailer Fixes: f4ba145056db ("Aarch64: Add inline assembly pthread wrapper") Signed-off-by: Thirumalai Nagalingam <thirumalai.nagalin...@multicorewareinc.com>
0001-Cygwin-Aarch64-optimize-pthread_wrapper-register-usa.patch
Description: 0001-Cygwin-Aarch64-optimize-pthread_wrapper-register-usa.patch