On Fri, 9 Jan 2026 at 01:37, H. Peter Anvin <[email protected]> wrote: > > On 2026-01-08 01:25, Ard Biesheuvel wrote: > > This series is a follow-up to a series I sent a bit more than a year > > ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite > > for further hardening measures, such as fg-kaslr [1], as well as further > > harmonization of the boot protocols between architectures [2]. > > Kristin Accardi had fg-kasrl running without that, didn't she? >
Yes, as a proof of concept. But it is tied to the x86 approach of performing runtime relocations based on build time relocation data, which is problematic now that linkers have started to perform relaxations, as these cannot always be translated 1:1. For instance, we already have a latent bug in the x86 relocs tool, which ignores GOTPCREL relocations on the basis that the relocation is relative. However, this is only true for Clang/lld, which does not update the static relocation tables after performing relaxations. ld.bfd does attempt to keep those tables in sync, and so a GOTPCREL relocation should be flagged as a bug when encountered, because it means there is a GOT slot somewhere with no relocation associated with it. One could argue that this example is just a Clang bug, but it is very difficult to make that case with the toolchain developers, given that --emit-relocs (which is what tells the linker to emit the relocations that it received as input) has no specification, and some linker relaxations are not representable as static relocations to begin with (but to be fair, that currently mostly affects other architectures, but there is no reason this could never happen on x86) Doing fgkaslr properly (IMHO) means supporting things like live patch and debug seamlessly, and in a portable manner. Toolchain support is critical, and securing that for a one-off x86 implementation rather than one that can be used across architectures and other bare-metal projects is going to be difficult. > From your footnotes, it looks like what you are *really* asking for is to > pessimize x86 code to benefit other architectures. That isn't inherently > wrong, but stating it as you have above is dishonest. > I was hoping to save the ad-hominems for later in the thread, when things *really* heat up. The point is not to benefit other architectures. The point is to implement something once, and deploy it on all architectures in the same way. ELF is the greatest common denominator across the entire ecosystem, and so using idiomatic ELF to describe how to load the image and how to move it around in the virtual address space is on obvious choice. > > The main sticking point is the fact that PIE linking on x86_64 requires > > PIE codegen, and that was shot down before on the basis that > > a) GOTs in fully linked binaries are stupid > > b) the code size increase would be prohibitive > > c) the performance would suffer. > > > > This series implements PIE codegen without permitting the use of GOT > > slots. The code size increase is between 0.2% (clang) and 0.5% (gcc), > > and I could not identify any performance regressions (using hackbench) > > on various different micro-architectures that I tried it on. > > (Suggestions for other benchmarks/test cases are welcome) > > Could you show some examples of how the code changes? > Taking the address of a symbol (same code size) 0: 48 c7 c0 00 00 00 00 mov $0x0,%rax 3: R_X86_64_32S sym 7: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 0xe a: R_X86_64_PC32 Loading a global variable from memory (one byte shorter in PIC) e: 48 8b 04 25 00 00 00 mov 0x0,%rax 15: 00 12: R_X86_64_32S sym 16: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 0x1d 19: R_X86_64_PC32 sym-0x4 Indexing a global array (3 bytes longer in PIC, needs an additional GPR if source and destination are the same) 1d: 48 8b 04 c5 00 00 00 mov 0x0(,%rax,8),%rax 24: 00 21: R_X86_64_32S array 25: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # 0x2c 28: R_X86_64_PC32 array-0x4 2c: 48 8b 04 c2 mov (%rdx,%rax,8),%rax Pushing the address of a symbol to the stack ((3 bytes longer in PIC, needs an additional GPR) 30: 68 00 00 00 00 push $0x0 31: R_X86_64_32S sym 35: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 0x3c 38: R_X86_64_PC32 sym-0x4 3c: 50 push %rax Jump tables look completely different, but the table itself is only half the size. Even for non-PIC, jump tables are problematic for objtool, and so these need to be annotated by the compiler. I have some unfinished Clang patches that implement this, which I hope to get back to soon. The asm patches in the series should give a good impression of how the code changes. > > > > [1] There have been a few attempts at landing fine grained KASLR for > > x86, but the main problem is that it was tied to the x86 relocation > > format, which deviates from how fully linked relocatable ELF binaries > > are generally constructed (using PIE). Implementing fgkaslr in the ELF > > domain would make it suitable for other architectures too, as well as > > other use cases (bare metal or hosted) where no dynamic linking is > > performed (firmware, hypervisors). In order to implement this properly, > > i.e., with debugging support etc, it needs support from the tooling > > side. (Fine grained KASLR in combination with execute-only code mappings > > makes it extremely difficult for an attacker to subvert the control flow > > in the kernel in a way that can be meaningfully exploited). > > > > [2] EFI zboot is already used by various architectures that have no > > decompressor stage at all (arm64, RISC-V, LoongArch), and this format > > can be combined with an ELF payload too. EFI zboot accommodates non-EFI > > boot chains by describing the size, offset, payload type and compression > > type in its header, so that it can be extracted and booted by other > > means. > > The bzImage format already have that for all practical purposes. We *really* > don't want to introduce a new binary format for the x86 kernel. A bunch of > such attempts have been done in the past, and it is nothing but a mess that > breaks things, because now you are encouraging different bootloaders to > support a non-overlapping set of binary formats. > > STRONG NAK on that one. > I think it should be feasible to implement a hybrid bzImage/EFI zboot format. There is already prior art in loaders that decompress the ELF payload directly (Xen). Given that a x86_64 bootloader running in long mode needs to do very little beyond loading the ELF at some arbitrary 2M aligned offset and calling the entrypoint with a struct bootparams in %RDI, most of the logic in the decompressor is really only needed when booting in 32-bit mode. So I think there is value in having a generic boot format that can be consumed by EFI directly, or by a generic ELF vmlinux loader (library) that understands the EFI zboot format and knows how to extract the ELF payload. I'd strongly prefer only a single idiom for describing the relocations in the image. On other architectures (i.e., without decompressor), EFI zboot would be a prerequisite for fgkaslr, but it is up to the platform to decide whether to boot via EFI or load the ELF and apply the relocations. On x86_64, the same tooling would work seamlessly, but the decompressor could apply the relocations itself as well.
