On Fri, 12 Sept 2025 at 11:08, Kees Cook <k...@kernel.org> wrote: > > On Fri, Sep 12, 2025 at 02:03:08AM -0700, Kees Cook wrote: > > On Thu, Sep 11, 2025 at 09:49:56AM +0200, Ard Biesheuvel wrote: > > > On Fri, 5 Sept 2025 at 02:24, Kees Cook <k...@kernel.org> wrote: > > > > > > > > Implement ARM 32-bit KCFI backend supporting ARMv7+: > > > > > > > > - Function preamble generation using .word directives for type ID > > > > storage > > > > at -4 byte offset from function entry point (no prefix NOPs needed > > > > due to > > > > 4-byte instruction alignment). > > > > > > > > - Use movw/movt instructions for 32-bit immediate loading. > > > > > > > > - Trap debugging through UDF instruction immediate encoding following > > > > AArch64 BRK pattern for encoding registers with useful contents. > > > > > > > > - Scratch register allocation using r0/r1 following ARM procedure call > > > > standard for caller-saved temporary registers, though they get > > > > stack spilled due to register pressure. > > > > > > > > Assembly Code Pattern for ARM 32-bit: > > > > push {r0, r1} ; Spill r0, r1 > > > > ldr r0, [target, #-4] ; Load actual type ID from preamble > > > > movw r1, #type_id_low ; Load expected type (lower 16 bits) > > > > movt r1, #type_id_high ; Load upper 16 bits with top instruction > > > > cmp r0, r1 ; Compare type IDs directly > > > > pop [r0, r1] ; Reload r0, r1 > > > > > > We could avoid the MOVW/MOVT pair and the spilling by doing something > > > along the lines of > > > > > > ldr ip, [target, #-4] > > > eor ip, ip, #type_id[0] > > > eor ip, ip, #type_id[1] << 8 > > > eor ip, ip, #type_id[2] << 16 > > > eors ip, ip, #type_id[3] << 24 > > > ldrne ip, =type_id[3:0] > > > > Ah-ha, nice. And it could re-load the type_id on the slow path instead > > of unconditionally, I guess? (So no "ne" suffix needed there.) > > > > ... > > eors ip, ip, #type_id[3] << 24 > > beq .Lkcfi_call > > .Lkcfi_trap: > > ldr ip, =type_id[3:0]
Yeah better. If you use the right compiler abstraction to emit this load, it will be turned into MOVW/MOVT if the target supports it. > > udf #nnn > > .Lkcfi_call: > > blx target > > > > > > > > > > Note that IP (R12) should be dead before a function call. Here it is > > > conditionally loaded with the expected target typeid, removing the > > > need to decode the instructions to recover it when the trap occurs. > > > > > > This should compile to Thumb2 as well as ARM encodings. > > > > Won't IP get used as the target register if r0-r3 are used for passing > > arguments? AAPCS implies this is how it'll go (4 arguments in registers, > > the rest on stack), but when I tried to force this to happen, it looked > > like it'd only pass 3 via registers, and would make the call with r3. > > Wait, I misread, my test is using r4 as the target! Still, is IP guaranteed > to never be used for the target? > The target register can be any GPR. IP is guaranteed by AAPCS not to play a role in parameter passing, because it is the Inter Procedural scratch register, and may be clobbered by PLT trampolines that get inserted between a direct call and its target. These are not direct calls, of course, but the callee does not know that, and so it cannot make any assumptions about the value of IP. That said, I'm not sure I understand why this type register has to be a fixed register. It /can/ be a fixed register, but you'd have to tell the compiler that. In that case, it can still use the link register for the target, unless it is emitting a tail call and LR needs to be preserved. The upshot of that would be that some tail calls will be converted into ordinary calls, due to the need to preserve some registers on the stack. But I'd still assume letting the compiler do this when needed is better than always pushing/popping two registers in the CFI call sequence.