On Fri, 12 Sept 2025 at 11:08, Kees Cook <k...@kernel.org> wrote:
>
> On Fri, Sep 12, 2025 at 02:03:08AM -0700, Kees Cook wrote:
> > On Thu, Sep 11, 2025 at 09:49:56AM +0200, Ard Biesheuvel wrote:
> > > On Fri, 5 Sept 2025 at 02:24, Kees Cook <k...@kernel.org> wrote:
> > > >
> > > > Implement ARM 32-bit KCFI backend supporting ARMv7+:
> > > >
> > > > - Function preamble generation using .word directives for type ID 
> > > > storage
> > > >   at -4 byte offset from function entry point (no prefix NOPs needed 
> > > > due to
> > > >   4-byte instruction alignment).
> > > >
> > > > - Use movw/movt instructions for 32-bit immediate loading.
> > > >
> > > > - Trap debugging through UDF instruction immediate encoding following
> > > >   AArch64 BRK pattern for encoding registers with useful contents.
> > > >
> > > > - Scratch register allocation using r0/r1 following ARM procedure call
> > > >   standard for caller-saved temporary registers, though they get
> > > >   stack spilled due to register pressure.
> > > >
> > > > Assembly Code Pattern for ARM 32-bit:
> > > >   push {r0, r1}                ; Spill r0, r1
> > > >   ldr  r0, [target, #-4]       ; Load actual type ID from preamble
> > > >   movw r1, #type_id_low        ; Load expected type (lower 16 bits)
> > > >   movt r1, #type_id_high       ; Load upper 16 bits with top instruction
> > > >   cmp  r0, r1                  ; Compare type IDs directly
> > > >   pop [r0, r1]                 ; Reload r0, r1
> > >
> > > We could avoid the MOVW/MOVT pair and the spilling by doing something
> > > along the lines of
> > >
> > > ldr   ip, [target, #-4]
> > > eor   ip, ip, #type_id[0]
> > > eor   ip, ip, #type_id[1] << 8
> > > eor   ip, ip, #type_id[2] << 16
> > > eors  ip, ip, #type_id[3] << 24
> > > ldrne ip, =type_id[3:0]
> >
> > Ah-ha, nice. And it could re-load the type_id on the slow path instead
> > of unconditionally, I guess? (So no "ne" suffix needed there.)
> >
> >   ...
> >   eors  ip, ip, #type_id[3] << 24
> >   beq .Lkcfi_call
> > .Lkcfi_trap:
> >   ldr ip, =type_id[3:0]

Yeah better. If you use the right compiler abstraction to emit this
load, it will be turned into MOVW/MOVT if the target supports it.

> >   udf #nnn
> > .Lkcfi_call:
> >   blx target
> >
> >
> > >
> > > Note that IP (R12) should be dead before a function call. Here it is
> > > conditionally loaded with the expected target typeid, removing the
> > > need to decode the instructions to recover it when the trap occurs.
> > >
> > > This should compile to Thumb2 as well as ARM encodings.
> >
> > Won't IP get used as the target register if r0-r3 are used for passing
> > arguments? AAPCS implies this is how it'll go (4 arguments in registers,
> > the rest on stack), but when I tried to force this to happen, it looked
> > like it'd only pass 3 via registers, and would make the call with r3.
>
> Wait, I misread, my test is using r4 as the target! Still, is IP guaranteed
> to never be used for the target?
>

The target register can be any GPR. IP is guaranteed by AAPCS not to
play a role in parameter passing, because it is the Inter Procedural
scratch register, and may be clobbered by PLT trampolines that get
inserted between a direct call and its target. These are not direct
calls, of course, but the callee does not know that, and so it cannot
make any assumptions about the value of IP.

That said, I'm not sure I understand why this type register has to be
a fixed register. It /can/ be a fixed register, but you'd have to tell
the compiler that. In that case, it can still use the link register
for the target, unless it is emitting a tail call and LR needs to be
preserved. The upshot of that would be that some tail calls will be
converted into ordinary calls, due to the need to preserve some
registers on the stack. But I'd still assume letting the compiler do
this when needed is better than always pushing/popping two registers
in the CFI call sequence.

Reply via email to