On Fri, 10 Feb 2017 11:34:45 +0900 Masami Hiramatsu <mhira...@kernel.org> wrote:
> On Thu, 9 Feb 2017 16:49:00 +0000 > Russell King - ARM Linux <li...@armlinux.org.uk> wrote: > > > On Fri, Feb 10, 2017 at 01:32:22AM +0900, Masami Hiramatsu wrote: > > > Fix a possibility of deadlock case in kretprobe on arm > > > implementation. There may be a chance that the kretprobe > > > hash table lock can cause a dead lock. > > > > > > The senario is that a user puts 2 kretprobes, one on normal > > > function and one on a function which can be called from > > > somewhare which can interrupt in irq disabled critical > > > section like FIQ. > > > > If we: > > - hit a kernel tracing feature from FIQ context > > - the tracing feature takes a lock > > - the lock is also taken elsewhere on the same CPU with IRQs disabled > > > > we will quite simply deadlock. > > Correct. > > > In this case, kretprobe_hash_lock() takes the hlist_lock using > > raw_spin_lock_irqsave(). > > > > Now, from what I can see in the kprobes code, this lock is taken in > > other contexts (eg, kprobe_flush_task()), which means even with this > > fix, it's still risky if a kprobe is placed on a FIQ-called function. > > Oops, right! I'll fix that too. Thanks for pointed out. > > > > > > In this case, if the kernel hits the 1st kretprobe on a > > > normal function return which calls trampoline_handler(), > > > acquire a spinlock on the hash table in kretprobe_hash_lock() > > > and disable irqs. After that, if the 2nd kretprobe is kicked > > > from FIQ, it also calls trampoline_handler() and tries to > > > acquire the same spinlock (since the hash is based on > > > current task, same as the 1st kretprobe), it causes > > > a deadlock. > > > > So my deadlock scenario is: > > > > - we're in the middle of kprobe_flush_task() > > - FIQ happens, calls trampoline_handler() > > - deadlock in kretprobe_hash_lock() > > > > From what I can see, kretprobes in FIQ are just unsafe. > > Yes, NMI on x86 too. > > > I suspect that avoiding these deadlocks means that we have to deny > > kprobes from FIQ context - making trampoline_handler() return > > immediately if in_nmi() is true. > > Ah, in_nmi() means FIQ on arm :) > OK, but actually it is too late to check it in the enter of > trampoline_handler() since we don't know where is the real > return address at that point. So I'll check that in setup site > - kretprobe_pre_handler(). Hmm, pre_handler_kretprobe() already checked in_nmi(). So, I think this will no problem on FIQ too. Thanks, -- Masami Hiramatsu <mhira...@kernel.org>