On Fri, Feb 10, 2017 at 01:32:22AM +0900, Masami Hiramatsu wrote: > Fix a possibility of deadlock case in kretprobe on arm > implementation. There may be a chance that the kretprobe > hash table lock can cause a dead lock. > > The senario is that a user puts 2 kretprobes, one on normal > function and one on a function which can be called from > somewhare which can interrupt in irq disabled critical > section like FIQ.
If we: - hit a kernel tracing feature from FIQ context - the tracing feature takes a lock - the lock is also taken elsewhere on the same CPU with IRQs disabled we will quite simply deadlock. In this case, kretprobe_hash_lock() takes the hlist_lock using raw_spin_lock_irqsave(). Now, from what I can see in the kprobes code, this lock is taken in other contexts (eg, kprobe_flush_task()), which means even with this fix, it's still risky if a kprobe is placed on a FIQ-called function. > In this case, if the kernel hits the 1st kretprobe on a > normal function return which calls trampoline_handler(), > acquire a spinlock on the hash table in kretprobe_hash_lock() > and disable irqs. After that, if the 2nd kretprobe is kicked > from FIQ, it also calls trampoline_handler() and tries to > acquire the same spinlock (since the hash is based on > current task, same as the 1st kretprobe), it causes > a deadlock. So my deadlock scenario is: - we're in the middle of kprobe_flush_task() - FIQ happens, calls trampoline_handler() - deadlock in kretprobe_hash_lock() >From what I can see, kretprobes in FIQ are just unsafe. I suspect that avoiding these deadlocks means that we have to deny kprobes from FIQ context - making trampoline_handler() return immediately if in_nmi() is true. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.

