On Tue, May 19, 2026 at 10:25:04AM +0200, Fredrik Markstrom wrote:
> On Mon, May 18, 2026 at 04:06:11PM +0100, Will Deacon wrote:
> > On Thu, Apr 30, 2026 at 12:55:12PM +0200, Fredrik Markstrom wrote:
> > > Perf callchain unwinding follows userspace frame pointers via
> > > copy_from_user. A corrupted or malicious frame pointer can point
> > > into device I/O memory mapped into the process (e.g. via UIO or
> > > /dev/mem), causing the kernel to read from MMIO regions in PMU
> > > interrupt context. Such reads can have side effects on hardware
> > > (clearing status registers, advancing FIFOs, triggering DMA) and
> > > on arm64 can produce a synchronous external abort that panics the
> > > kernel.
> > 
> > Hmm, but why is unwinding special in this case? If userspace has access
> > to sensitive MMIO/device mappings, it can presumably pass them to
> > syscalls and trigger crashes all over the place?
> 
> You’re totally right, a broken app with access to hardware like this can
> already cause chaos by passing bad pointers to syscalls etc. But the big
> difference here is who is to blame when things crash.
>  
> If an app passes a bad pointer to a syscall, it’s self-inflicted.

So I was going to argue that building arm64 code without frame-pointers
is self-inflicted, but it looks like that's the default in GCC for some
bizarre reason.

> Unwinding here is asynchronous and unrelated to the application.
> Perf interrupts a perfectly healthy app at a random moment. If that app
> is using the frame pointer as a normal register (totally legal in
> optimized code), it might hold a junk value that points to MMIO memory.
>
> If the kernel blindly follows that junk pointer during an unwind, perf
> causes the crash. I think it's acceptable that an app (with hardware
> access) causes a crash if buggy, but I don't think it's acceptable that
> a profiling tool is causing a crash just by looking at it.

I can see your argument, but I'm also not hugely keen to add fastgup to
our stack unwinder for each frame record. It's also not clear to me how
you avoid the mapping changing between the check and the access, given
that you still appear to use the user mapping for the unwind. Do other
architectures have this issue and, if so, how do they solve it?

If we could guarantee that the fault is synchronous, then we could
presumably hook up the uaccess exception fixup handlers.

Will

Reply via email to