llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-flang-driver Author: Sairudra More (Saieiei) <details> <summary>Changes</summary> Flang currently lowers internal procedures passed as actual arguments using LLVM's `llvm.init.trampoline` / `llvm.adjust.trampoline` intrinsics, which require an executable stack. On modern Linux toolchains and security-hardened kernels that enforce W^X (Write XOR Execute), this causes link-time failures (`ld.lld: error: ... requires an executable stack`) or runtime `SEGV` from NX violations. This patch introduces a runtime trampoline pool that allocates trampolines from a dedicated `mmap`'d region instead of the stack. The pool toggles page permissions between writable (for patching) and executable (for dispatch), so the stack stays non-executable throughout. On macOS, MAP_JIT and `pthread_jit_write_protect_np` are used for the same effect. An i-cache flush (`__builtin___clear_cache` on Linux, `sys_icache_invalidate` on macOS) is performed after each write→exec transition. The feature is gated behind a new driver flag, `-fenable-runtime-trampoline` (off by default), which threads through the frontend into the `BoxedProcedurePass`. When enabled, the pass emits calls to `_FortranATrampolineInit`, `_FortranATrampolineAdjust`, and `_FortranATrampolineFree` instead of the legacy intrinsics. The legacy path is completely untouched when the flag is off. The pool is a singleton with a fixed capacity (default 1024 slots, overridable via `FLANG_TRAMPOLINE_POOL_SIZE`). Each slot is 32 bytes and holds a small architecture-specific stub, currently x86-64 (17 bytes, using `r10` as the nest/static-chain register) and AArch64 (24 bytes, using `x18`). The implementation compiles on all architectures but will crash at runtime with a clear diagnostic if trampoline emission is actually attempted on an unsupported target. This avoids breaking the flang-rt build on e.g. RISC-V or PPC64. Freed slots are poisoned (the callee pointer is overwritten with a sentinel) and recycled into a freelist, so the pool can sustain long-running programs that repeatedly create and destroy closures. A few design choices worth calling out: The runtime avoids all C++ runtime dependencies, no `std::mutex`, no `operator new`, no function-local statics with hidden guard variables. Locking is via flang-rt's own `Lock` / `CriticalSection`, memory is via `AllocateMemoryOrCrash` / `FreeMemory`, and the singleton uses explicit double-checked locking with a raw pointer. This was done so the trampoline pool links cleanly in minimal / freestanding flang-rt configurations. `_FortranATrampolineFree` calls are inserted immediately before every `func.return` in the enclosing host function. This is a conservative but correct strategy. The trampoline handle cannot outlive the host's stack frame since the closure captures the host's local variables by reference. The GNU_STACK note is verified via a dedicated integration test (`runtime-trampoline-gnustack.f90`) that compiles and links a Fortran program using the runtime path, then inspects the ELF with `llvm-readelf` to confirm the stack segment is `RW` (not `RWE`). **Test coverage:** - `flang/test/Driver/fenable-runtime-trampoline.f90` — flag forwarding (on, off, default) - `flang/test/Fir/boxproc-runtime-trampoline.fir` — FIR-level FileCheck for emitted runtime calls - `flang/test/Lower/runtime-trampoline.f90` — end-to-end lowering - `flang-rt/test/Driver/runtime-trampoline-gnustack.f90` — GNU_STACK ELF verification Closes #<!-- -->182813 --- Patch is 68.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/183108.diff 23 Files Affected: - (modified) clang/include/clang/Options/Options.td (+5) - (modified) clang/lib/Driver/ToolChains/Flang.cpp (+4) - (added) flang-rt/include/flang-rt/runtime/trampoline.h (+69) - (modified) flang-rt/lib/runtime/CMakeLists.txt (+1) - (added) flang-rt/lib/runtime/trampoline.cpp (+424) - (added) flang-rt/test/Driver/runtime-trampoline-gnustack.f90 (+45) - (modified) flang/include/flang/Frontend/CodeGenOptions.def (+1) - (modified) flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h (+4) - (added) flang/include/flang/Optimizer/Builder/Runtime/Trampoline.h (+47) - (modified) flang/include/flang/Optimizer/CodeGen/CGPasses.td (+11-5) - (modified) flang/include/flang/Optimizer/Passes/CommandLineOpts.h (+1) - (modified) flang/include/flang/Optimizer/Passes/Pipelines.h (+2-1) - (added) flang/include/flang/Runtime/trampoline.h (+69) - (modified) flang/include/flang/Tools/CrossToolHelpers.h (+2) - (modified) flang/lib/Frontend/CompilerInvocation.cpp (+4) - (modified) flang/lib/Optimizer/Builder/CMakeLists.txt (+1) - (added) flang/lib/Optimizer/Builder/Runtime/Trampoline.cpp (+49) - (modified) flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp (+272-192) - (modified) flang/lib/Optimizer/Passes/CommandLineOpts.cpp (+2) - (modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+11-4) - (added) flang/test/Driver/fenable-runtime-trampoline.f90 (+15) - (added) flang/test/Fir/boxproc-runtime-trampoline.fir (+67) - (added) flang/test/Lower/runtime-trampoline.f90 (+41) ``````````diff diff --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td index 4ac812e92e2cb..93c1f2f529e3e 100644 --- a/clang/include/clang/Options/Options.td +++ b/clang/include/clang/Options/Options.td @@ -7567,6 +7567,11 @@ defm stack_arrays : BoolOptionWithoutMarshalling<"f", "stack-arrays", PosFlag<SetTrue, [], [ClangOption], "Attempt to allocate array temporaries on the stack, no matter their size">, NegFlag<SetFalse, [], [ClangOption], "Allocate array temporaries on the heap (default)">>; +defm enable_runtime_trampoline : BoolOptionWithoutMarshalling<"f", + "enable-runtime-trampoline", + PosFlag<SetTrue, [], [ClangOption], "Use W^X compliant runtime trampoline pool for internal procedures">, + NegFlag<SetFalse, [], [ClangOption], "Use stack-based trampolines for internal procedures (default)">>; + defm loop_versioning : BoolOptionWithoutMarshalling<"f", "version-loops-for-stride", PosFlag<SetTrue, [], [ClangOption], "Create unit-strided versions of loops">, NegFlag<SetFalse, [], [ClangOption], "Do not create unit-strided loops (default)">>; diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 8425f8fec62a4..e2f04c4725def 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -203,6 +203,10 @@ void Flang::addCodegenOptions(const ArgList &Args, !stackArrays->getOption().matches(options::OPT_fno_stack_arrays)) CmdArgs.push_back("-fstack-arrays"); + if (Args.hasFlag(options::OPT_fenable_runtime_trampoline, + options::OPT_fno_enable_runtime_trampoline, false)) + CmdArgs.push_back("-fenable-runtime-trampoline"); + // -fno-protect-parens is the default for -Ofast. if (!Args.hasFlag(options::OPT_fprotect_parens, options::OPT_fno_protect_parens, diff --git a/flang-rt/include/flang-rt/runtime/trampoline.h b/flang-rt/include/flang-rt/runtime/trampoline.h new file mode 100644 index 0000000000000..3b3ddff7a0587 --- /dev/null +++ b/flang-rt/include/flang-rt/runtime/trampoline.h @@ -0,0 +1,69 @@ +//===-- flang-rt/runtime/trampoline.h ----------------------------*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Internal declarations for the W^X-compliant trampoline pool. +// +//===----------------------------------------------------------------------===// + +#ifndef FLANG_RT_RUNTIME_TRAMPOLINE_H_ +#define FLANG_RT_RUNTIME_TRAMPOLINE_H_ + +#include <cstddef> +#include <cstdint> + +namespace Fortran::runtime::trampoline { + +/// Per-trampoline data entry. Stored in a writable (non-executable) region. +/// Each entry is paired with a trampoline code stub in the executable region. +struct TrampolineData { + const void *calleeAddress; + const void *staticChainAddress; +}; + +/// Default number of trampoline slots in the pool. +/// Can be overridden via FLANG_TRAMPOLINE_POOL_SIZE environment variable. +constexpr std::size_t kDefaultPoolSize = 1024; + +/// Size of each trampoline code stub in bytes (platform-specific). +#if defined(__x86_64__) || defined(_M_X64) +// x86-64 trampoline stub: +// movq TDATA_OFFSET(%rip), %r10 # load static chain from TDATA +// movabsq $0, %r11 # placeholder for callee address +// jmpq *%r11 +// Actually we use an indirect approach through the TDATA pointer: +// movq (%r10), %r10 # load static chain (8 bytes) +// -- but we need the TDATA pointer first +// Simplified approach for x86-64: +// leaq tdata_entry(%rip), %r11 # get TDATA entry address +// movq 8(%r11), %r10 # load static chain +// jmpq *(%r11) # jump to callee +constexpr std::size_t kTrampolineStubSize = 32; +constexpr int kNestRegister = 10; // %r10 is the nest/static chain register +#elif defined(__aarch64__) || defined(_M_ARM64) +// AArch64 trampoline stub: +// adr x17, tdata_entry # get TDATA entry address +// ldr x18, [x17, #8] # load static chain +// ldr x17, [x17] # load callee address +// br x17 +constexpr std::size_t kTrampolineStubSize = 32; +constexpr int kNestRegister = 18; // x18 is the platform register +#elif defined(__powerpc64__) || defined(__ppc64__) +constexpr std::size_t kTrampolineStubSize = 48; +constexpr int kNestRegister = 11; // r11 +#else +// Fallback: generous size +constexpr std::size_t kTrampolineStubSize = 64; +constexpr int kNestRegister = 0; +#endif + +/// Alignment requirement for trampoline code stubs. +constexpr std::size_t kTrampolineAlignment = 16; + +} // namespace Fortran::runtime::trampoline + +#endif // FLANG_RT_RUNTIME_TRAMPOLINE_H_ diff --git a/flang-rt/lib/runtime/CMakeLists.txt b/flang-rt/lib/runtime/CMakeLists.txt index 9fa8376e9b99c..d5e89a169255c 100644 --- a/flang-rt/lib/runtime/CMakeLists.txt +++ b/flang-rt/lib/runtime/CMakeLists.txt @@ -88,6 +88,7 @@ set(host_sources stop.cpp temporary-stack.cpp time-intrinsic.cpp + trampoline.cpp unit-map.cpp ) if (TARGET llvm-libc-common-utilities) diff --git a/flang-rt/lib/runtime/trampoline.cpp b/flang-rt/lib/runtime/trampoline.cpp new file mode 100644 index 0000000000000..ad6148f36392e --- /dev/null +++ b/flang-rt/lib/runtime/trampoline.cpp @@ -0,0 +1,424 @@ +//===-- lib/runtime/trampoline.cpp -------------------------------*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// W^X-compliant trampoline pool implementation. +// +// This file implements a runtime trampoline pool that maintains separate +// memory regions for executable code (RX) and writable data (RW). +// +// On Linux the code region transitions RW → RX (never simultaneously W+X). +// On macOS Apple Silicon the code region uses MAP_JIT with per-thread W^X +// toggling via pthread_jit_write_protect_np, so the mapping permissions +// include both W and X but hardware enforces that only one is active at +// a time on any given thread. +// +// Architecture: +// - Code region (RX): Contains pre-assembled trampoline stubs that load +// callee address and static chain from a paired TDATA entry, then jump +// to the callee with the static chain in the appropriate register. +// - Data region (RW): Contains TrampolineData entries with {callee_address, +// static_chain_address} pairs, one per trampoline slot. +// - Free list: Tracks available trampoline slots for O(1) alloc/free. +// +// Thread safety: Uses Fortran::runtime::Lock (pthreads on POSIX, +// CRITICAL_SECTION on Windows) — not std::mutex — to avoid C++ runtime +// library dependence. A single global lock serializes pool operations. +// This is a deliberate V1 design choice to keep the initial W^X +// architectural change minimal. Per-thread lock-free pools are deferred +// to a future optimization patch. +// +// AddressSanitizer note: The trampoline code region is allocated via +// mmap (not malloc/new), so ASan does not track it. The data region +// and handles are allocated via malloc (through AllocateMemoryOrCrash), +// which ASan intercepts normally. No special annotations are needed. +// +// See flang/docs/InternalProcedureTrampolines.md for design details. +// +//===----------------------------------------------------------------------===// + +#include "flang/Runtime/trampoline.h" +#include "flang-rt/runtime/lock.h" +#include "flang-rt/runtime/memory.h" +#include "flang-rt/runtime/terminator.h" +#include "flang-rt/runtime/trampoline.h" + +#include <cassert> +#include <cstdint> +#include <cstdlib> +#include <cstring> +#include <new> // For placement-new only (no operator new/delete dependency) + +// Platform-specific headers for memory mapping. +#if defined(_WIN32) +#include <windows.h> +#else +#include <sys/mman.h> +#include <unistd.h> +#endif + +// macOS Apple Silicon requires MAP_JIT and pthread_jit_write_protect_np +// to create executable memory under the hardened runtime. +#if defined(__APPLE__) && defined(__aarch64__) +#include <libkern/OSCacheControl.h> +#include <pthread.h> +#endif + +// Architecture support check. Stub generators exist only for x86-64 and +// AArch64. On other architectures the file compiles but the runtime API +// functions crash with a diagnostic if actually called, so that building +// flang-rt on e.g. RISC-V or PPC64 never fails. +#if defined(__x86_64__) || defined(_M_X64) || defined(__aarch64__) || \ + defined(_M_ARM64) +#define TRAMPOLINE_ARCH_SUPPORTED 1 +#else +#define TRAMPOLINE_ARCH_SUPPORTED 0 +#endif + +namespace Fortran::runtime::trampoline { + +/// A handle returned to the caller. Contains enough info to find +/// both the trampoline stub and its data entry. +struct TrampolineHandle { + void *codePtr; // Pointer to the trampoline stub in the RX region. + TrampolineData *dataPtr; // Pointer to the data entry in the RW region. + std::size_t slotIndex; // Index in the pool for free-list management. +}; + +// Namespace-scope globals following Flang runtime conventions: +// - Lock is trivially constructible (pthread_mutex_t / CRITICAL_SECTION) +// - Pool pointer starts null; initialized under lock (double-checked locking) +class TrampolinePool; // Forward declaration for pointer below. +static Lock poolLock; +static TrampolinePool *poolInstance{nullptr}; + +/// The global trampoline pool. +class TrampolinePool { +public: + static TrampolinePool &instance() { + if (poolInstance) { + return *poolInstance; + } + CriticalSection critical{poolLock}; + if (poolInstance) { + return *poolInstance; + } + // Allocate pool using malloc + placement new (trivial constructor). + Terminator terminator{__FILE__, __LINE__}; + void *storage = AllocateMemoryOrCrash(terminator, sizeof(TrampolinePool)); + poolInstance = new (storage) TrampolinePool(); + return *poolInstance; + } + + /// Allocate a trampoline slot and initialize it. + TrampolineHandle *allocate( + const void *calleeAddress, const void *staticChainAddress) { + CriticalSection critical{lock_}; + ensureInitialized(); + + if (freeHead_ == kInvalidIndex) { + // Pool exhausted — fixed size by design for V1. + // The pool capacity is controlled by FLANG_TRAMPOLINE_POOL_SIZE + // (default 1024). Dynamic slab growth can be added in a follow-up + // patch if real workloads demonstrate a need for it. + Terminator terminator{__FILE__, __LINE__}; + terminator.Crash("Trampoline pool exhausted (max %zu slots). " + "Set FLANG_TRAMPOLINE_POOL_SIZE to increase.", + poolSize_); + } + + std::size_t index = freeHead_; + freeHead_ = freeList_[index]; + + // Initialize the data entry. + dataRegion_[index].calleeAddress = calleeAddress; + dataRegion_[index].staticChainAddress = staticChainAddress; + + // Create handle using malloc + placement new. + Terminator terminator{__FILE__, __LINE__}; + void *mem = AllocateMemoryOrCrash(terminator, sizeof(TrampolineHandle)); + auto *handle = new (mem) TrampolineHandle(); + handle->codePtr = + static_cast<char *>(codeRegion_) + index * kTrampolineStubSize; + handle->dataPtr = &dataRegion_[index]; + handle->slotIndex = index; + + return handle; + } + + /// Get the callable address of a trampoline. + void *getCallableAddress(TrampolineHandle *handle) { return handle->codePtr; } + + /// Free a trampoline slot. + void free(TrampolineHandle *handle) { + CriticalSection critical{lock_}; + + std::size_t index = handle->slotIndex; + + // Poison the data entry so that any dangling call through a freed + // trampoline traps immediately. We use a non-null, obviously-invalid + // address (0xDEAD...) so that the resulting fault is distinguishable + // from a null-pointer dereference when debugging. + dataRegion_[index].calleeAddress = reinterpret_cast<const void *>( + static_cast<uintptr_t>(~uintptr_t{0} - 1)); + dataRegion_[index].staticChainAddress = nullptr; + + // Return slot to free list. + freeList_[index] = freeHead_; + freeHead_ = index; + + FreeMemory(handle); + } + +private: + static constexpr std::size_t kInvalidIndex = ~std::size_t{0}; + + TrampolinePool() = default; + + void ensureInitialized() { + if (initialized_) + return; + initialized_ = true; + + // Check environment variable for pool size override. + // Fixed-size pool by design (V1): avoids complexity of dynamic growth + // and re-protection of code pages. The default (1024 slots) is + // sufficient for typical Fortran programs. Users can override via: + // export FLANG_TRAMPOLINE_POOL_SIZE=4096 + poolSize_ = kDefaultPoolSize; + if (const char *envSize = std::getenv("FLANG_TRAMPOLINE_POOL_SIZE")) { + long val = std::strtol(envSize, nullptr, 10); + if (val > 0) + poolSize_ = static_cast<std::size_t>(val); + } + + // Allocate the data region (RW). + dataRegion_ = static_cast<TrampolineData *>( + std::calloc(poolSize_, sizeof(TrampolineData))); + assert(dataRegion_ && "Failed to allocate trampoline data region"); + + // Allocate the code region (initially RW for writing stubs, then RX). + std::size_t codeSize = poolSize_ * kTrampolineStubSize; +#if defined(_WIN32) + codeRegion_ = VirtualAlloc( + nullptr, codeSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); +#elif defined(__APPLE__) && defined(__aarch64__) + // macOS Apple Silicon: MAP_JIT is required for pages that will become + // executable. Use pthread_jit_write_protect_np to toggle W↔X. + codeRegion_ = mmap(nullptr, codeSize, PROT_READ | PROT_WRITE | PROT_EXEC, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_JIT, -1, 0); + if (codeRegion_ == MAP_FAILED) + codeRegion_ = nullptr; + if (codeRegion_) { + // Enable writing on this thread (MAP_JIT defaults to execute). + pthread_jit_write_protect_np(0); // 0 = writable + } +#else + codeRegion_ = mmap(nullptr, codeSize, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (codeRegion_ == MAP_FAILED) + codeRegion_ = nullptr; +#endif + assert(codeRegion_ && "Failed to allocate trampoline code region"); + + // Generate trampoline stubs. + generateStubs(); + + // Flush instruction cache. Required on architectures with non-coherent + // I-cache/D-cache (AArch64, PPC, etc.). On x86-64 this is a no-op + // but harmless. Without this, AArch64 may execute stale instructions. +#if defined(__APPLE__) && defined(__aarch64__) + // On macOS, use sys_icache_invalidate (from libkern/OSCacheControl.h). + sys_icache_invalidate(codeRegion_, codeSize); +#elif defined(_WIN32) + FlushInstructionCache(GetCurrentProcess(), codeRegion_, codeSize); +#else + __builtin___clear_cache(static_cast<char *>(codeRegion_), + static_cast<char *>(codeRegion_) + codeSize); +#endif + + // Make code region executable and non-writable (W^X). +#if defined(_WIN32) + DWORD oldProtect; + VirtualProtect(codeRegion_, codeSize, PAGE_EXECUTE_READ, &oldProtect); +#elif defined(__APPLE__) && defined(__aarch64__) + // Switch back to execute-only (MAP_JIT manages per-thread W^X). + pthread_jit_write_protect_np(1); // 1 = executable +#else + mprotect(codeRegion_, codeSize, PROT_READ | PROT_EXEC); +#endif + + // Initialize free list. + freeList_ = static_cast<std::size_t *>( + std::malloc(poolSize_ * sizeof(std::size_t))); + assert(freeList_ && "Failed to allocate trampoline free list"); + + for (std::size_t i = 0; i < poolSize_ - 1; ++i) + freeList_[i] = i + 1; + freeList_[poolSize_ - 1] = kInvalidIndex; + freeHead_ = 0; + } + + /// Generate platform-specific trampoline stubs in the code region. + /// Each stub loads callee address and static chain from its paired + /// TDATA entry and jumps to the callee. + void generateStubs() { +#if defined(__x86_64__) || defined(_M_X64) + generateStubsX86_64(); +#elif defined(__aarch64__) || defined(_M_ARM64) + generateStubsAArch64(); +#else + // Unsupported architecture — should never be reached because the + // extern "C" API functions guard with TRAMPOLINE_ARCH_SUPPORTED. + // Fill with trap bytes as a safety net. + std::memset(codeRegion_, 0, poolSize_ * kTrampolineStubSize); +#endif + } + +#if defined(__x86_64__) || defined(_M_X64) + /// Generate x86-64 trampoline stubs. + /// + /// Each stub does: + /// movabsq $dataEntry, %r11 ; load TDATA entry address + /// movq 8(%r11), %r10 ; load static chain -> nest register + /// jmpq *(%r11) ; jump to callee address + /// + /// Total: 10 + 4 + 3 = 17 bytes, padded to kTrampolineStubSize. + void generateStubsX86_64() { + auto *code = static_cast<uint8_t *>(codeRegion_); + + for (std::size_t i = 0; i < poolSize_; ++i) { + uint8_t *stub = code + i * kTrampolineStubSize; + + // Address of the corresponding TDATA entry. + auto dataAddr = reinterpret_cast<uint64_t>(&dataRegion_[i]); + + std::size_t off = 0; + + // movabsq $dataAddr, %r11 (REX.W + B, opcode 0xBB for r11) + stub[off++] = 0x49; // REX.WB + stub[off++] = 0xBB; // MOV r11, imm64 + std::memcpy(&stub[off], &dataAddr, 8); + off += 8; + + // movq 8(%r11), %r10 (load staticChainAddress into r10) + stub[off++] = 0x4D; // REX.WRB + stub[off++] = 0x8B; // MOV r/m64 -> r64 + stub[off++] = 0x53; // ModRM: [r11 + disp8], r10 + stub[off++] = 0x08; // disp8 = 8 + + // jmpq *(%r11) (jump to calleeAddress) + stub[off++] = 0x41; // REX.B + stub[off++] = 0xFF; // JMP r/m64 + stub[off++] = 0x23; // ModRM: [r11], opcode extension 4 + + // Pad the rest with INT3 (0xCC) for safety. + while (off < kTrampolineStubSize) + stub[off++] = 0xCC; + } + } +#endif + +#if defined(__aarch64__) || defined(_M_ARM64) + /// Generate AArch64 trampoline stubs. + /// + /// Each stub does: + /// ldr x17, .Ldata_addr ; load TDATA entry address + /// ldr x18, [x17, #8] ; load static chain -> x18 (nest reg) + /// ldr x17, [x17] ; load callee address + /// br x17 ; jump to callee + /// .Ldata_addr: + /// .quad <address of dataRegion_[i]> + /// + /// Total: 4*4 + 8 = 24 bytes, padded to kTrampolineStubSize. + void generateStubsAArch64() { + auto *code = static_cast<uint8_t *>(codeRegion_); + + for (std::size_t i = 0; i < poolSize_; ++i) { + auto *stub = reinterpret_cast<uint32_t *>(code + i * kTrampolineStubSize); + + // Address of the corresponding TDATA entry. + auto dataAddr = reinterpret_cast<uint64_t>(&dataRegion_[i]); + + // ldr x17, .Ldata_addr (PC-relative load, offset = 4 instructions = 16 + // bytes) LDR (literal): opc=01, V=0, imm19=(16/4)=4, Rt=17 + stub[0] = 0x58000091; // ldr x17, #16 (imm19=4, shifted left 2 = 16) + // Encoding: 0101 1000 0000 0000 0000 0000 1001 0001 + + // ldr x18, [x17, #8] (load static chain) + // LDR (unsigned offset): size=11, V=0, opc=01, imm12=1(×8), Rn=17, Rt=18 + stub[1] = 0xF9400632; // ldr x18, [x17, #8] + + // ldr x17, [x17] (load callee address) + // LDR (unsigned offset): size=11, V=0, opc=01, imm12=0, Rn=17, Rt=17 + stub[2] = 0xF9400231; // ldr x17, [x17, #0] + + // br x17 + stub[3] = 0xD61F0220; // br x17 + + // .Ldata_addr: .quad dataRegion_[i] + std::memcpy(&stub[4], &dataAddr, 8); + + // Pad remaining with BRK #0 (trap) for safety. + std::size_t usedWords = 4 + 2; // 4 instructions + 1 quad (2 words) + for (std::size_t w = usedWords; + w < kTrampolineStubSize / sizeof(uint32_t); ++w) + stub[w] = 0xD4200000; // brk #0 + } + } +#endif + + Lock lock_; + bool initialized_{false}; + std::size_t poolSize_{0}; + + void *codeRegion_{nullptr}; // RX after initialization + TrampolineData *da... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/183108 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
