On Thu, 13 Nov 2025, Evgeny Karpov wrote:
From: Evgeny Karpov <[email protected]>
Is this the intended email address to use for these contributions - no
more @microsoft.com?
Subject: [PATCH] aarch64: Add runtime relocations
The patch implements the required changes to support runtime relocations.
For 26-bit relocation, the linker generates a jump stub, as a single
opcode is not sufficient for relocation.
The supporting binutils patch is being upstreamed.
https://sourceware.org/pipermail/binutils/2025-November/145651.html
A similar change has been upstreamed to Cygwin.
https://cygwin.com/pipermail/cygwin-patches/2025q4/014332.html
Signed-off-by: Evgeny Karpov <[email protected]>
---
mingw-w64-crt/crt/pseudo-reloc.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/mingw-w64-crt/crt/pseudo-reloc.c b/mingw-w64-crt/crt/pseudo-reloc.c
index dd08e718a..20c04a1f0 100644
--- a/mingw-w64-crt/crt/pseudo-reloc.c
+++ b/mingw-w64-crt/crt/pseudo-reloc.c
@@ -464,6 +464,31 @@ do_pseudo_reloc (void * start, void * end, void * base)
case 16:
__write_memory ((void *) reloc_target, &reldata, 2);
break;
+#ifdef __aarch64__
+ case 12:
+ /* Replace add Xn, Xn, :lo12:label with ldr Xn, [Xn,
:lo12:__imp__func].
+ That loads the address of _func into Xn. */
+ opcode = 0xf9400000 | (opcode & 0x3ff); // ldr
+ reldata = ((ptrdiff_t) base + r->sym) & ((1 << 12) - 1);
+ reldata >>= 3;
+ opcode |= reldata << 10;
+ __write_memory ((void *) reloc_target, &opcode, 4);
+ break;
+ case 21:
+ /* Replace adrp Xn, label with adrp Xn, __imp__func. */
+ opcode &= 0x9f00001f;
+ reldata = (((ptrdiff_t) base + r->sym) >> 12)
+ - (((ptrdiff_t) base + r->target) >> 12);
+ reldata &= (1 << 21) - 1;
+ opcode |= (reldata & 3) << 29;
+ reldata >>= 2;
+ opcode |= reldata << 5;
+ __write_memory ((void *) reloc_target, &opcode, 4);
+ break;
+ /* A note regarding 26 bits relocation.
+ A single opcode is not sufficient for 26 bits relocation in dynamic
linking.
+ The linker generates a jump stub instead. */
+#endif
First off - I would point out that I did consider doing something like
this for the case with LLVM/Clang as well, but I decided not to.
Instead, in LLVM/Clang, we instead generate .refptr indirection - just
like GCC also does on x86_64. Doing that avoids a number of problems:
- It avoids having to add support for these new relocations here
(including the nitpicky details I'll follow up with below)
- It avoids having to do these relocations in the .text section. Doing
that requires changing the permission of the code section to
write+execute, which generally is undesireable. (Plus, in special
environments such as UWP, it is entirely forbidden to have regions of
memory being both writable and executable at the same time.)
- It avoids the issue with how far away the target symbol can be. E.g. on
x86_64, a 32 bit relative address isn't big enough if the target is too
far away. When this issue does show up, it produces extremely confusing
issues, so to help diagnose it better, we added a check here in the the
pseudo relocation code (see commit
ca35236d9799af8a3d2f9baa35b60e6c11abeb24) to error out if the target is
too far away to express in the given number of bits.
That said, I see that you've worked around the range issue by rewriting
"add Xn, Xn, :lo12:label" into "ldr Xn, [xn, :lo12:__imp_label]" - which
makes the range problem a non-issue. That's neat!
Regarding range, did you actually test this in a mingw setting? The range
check code, currently on line 442-456, should trigger on these relocations
(with bits == 12 or 21) and error out, even if you actually do handle a
larger range. If you want to go this way of this patch, I'm pretty sure
you need to patch the range check as well, to make it not trigger on these
relocations.
However - there are two potential flaws with this approach which I don't
see how you are solving. (Do you have a full prebuilt toolchain with these
patches integrated where I could try it out? I tried
https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/releases/download/2025-07-15/aarch64-w64-mingw32-msvcrt-toolchain.tar.gz
but that doesn't seem to have these bits enabled yet.)
What you have now works fine for code like this:
extern int variable;
int *get_var_addr(void) {
return &variable;
}
Where GCC generates code like this:
get_var_addr:
adrp x0, variable
add x0, x0, :lo12:variable
ret
If the linker part of these relocations work in the same way as for the
other existing pseudo relocations on x86, then the linker replaces the
symbol references to the undefined "variable" into "__imp_variable" at
linking time like this:
get_var_addr:
adrp x0, __imp_variable
add x0, x0, :lo12:__imp_variable
ret
Now this works fine with your pseudo relocation handling, which at runtime
turns it into this:
get_var_addr:
adrp x0, __imp_variable
ldr x0, [x0, :lo12:__imp_variable]
ret
However, what does it do about this case?
extern int variable;
int get_var(void) {
return variable;
}
With the current versions of GCC, this generates the following code:
get_var:
adrp x0, variable
ldr w0, [x0, #:lo12:variable]
ret
Now in this case, we already have the :lo12: relocation in an ldr
instruction, so the pseudo relocation trick no longer works as intended.
To fix this case, the pseudo relocation handling code would need to insert
an extra ldr instruction after this one.
The secondly, the current mechanism for the pseudo relocations work by
_adding_ the difference between the __imp_variable and the actual imported
address to the relocation. Not overwriting, but adding. This makes it also
work transparently for relocations with a PIC-relative address (although
that's range limited). It also makes it work for cases where the symbol
reference has a built-in offset.
As for concrete examples to show the issue:
struct S {
int a, b, c, d;
};
extern struct S s;
int get_field(void) {
return s.d;
}
int *get_field_addr(void) {
return &s.d;
}
Currently with GCC, this produces the following code:
get_field:
adrp x0, s+12
ldr w0, [x0, #:lo12:s+12]
ret
get_field_addr:
adrp x0, s+12
add x0, x0, :lo12:s+12
ret
Now if I understand the code you're proposing correctly, this would lose
and drop the +12 offset entirely, and just end up addressing the start of
the struct instead.
I guess it's possible to somehow try to work around these issues by making
GCC not emit this kind of code at all - to never bake in an offset like
this, and never generate a direct "ldr" like in the get_addr and get_field
cases. (Then you should also amend the linker to check that the symbol
offset, when creating such pseudo relocations, has to be zero, as it won't
work in the end otherwise.)
So it may be possible to hack around all these issues somehow, but I would
suggest instead going the same way as GCC did for x86_64, and we've done
in LLVM/Clang for all architectures, by doing indirection via a .refptr
pointer instead. That way, you don't need _any_ code changes to the linker
or runtime. (Other than adding support for pseudo relocations for
autoimport of full 64 bit addresses, if that needs architecture specific
code in binutils.)
// Martin
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public