On Mon, Aug 28, 2017 at 05:06:17PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 28, 2017 at 04:58:08PM +0200, Peter Zijlstra wrote:
> > On Fri, Aug 25, 2017 at 12:03:04PM +0200, Borislav Petkov wrote:
> > > Hey,
> > > 
> > > tglx says I have something for ya :-)
> > > 
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 4.13.0-rc6+ #1 Not tainted
> > > ------------------------------------------------------
> > > watchdog/3/27 is trying to acquire lock:
> > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8100c489>] 
> > > release_ds_buffers+0x29/0xd0
> > > 
> > > but now in release context of a crosslock acquired at the following:
> > >  ((complete)&self->parked){+.+.}, at: [<ffffffff810895f6>] 
> > > kthread_park+0x46/0x60
> > 
> > 
> > So I'm thinking this one is an actual deadlock.
> > 
> > So, as far as I can tell this ends up being:
> > 
> > CPU0                    CPU1
> > 
> > (smpboot_regiser_percpu_thread_cpumask)
> > 
> > get_online_cpus()
> > __smpboot_create_thread()
> >   kthread_park();
> >     wait_for_completion(&X)
> > 
> > 
> >                         (smpboot_thread_fn)
> > 
> >                         ->park() := watchdog_disable()
> >                           watchdog_nmi_disable()
> >                             perf_event_release_kernel();
> >                               put_event()
> >                                 _free_event()
> >                                   ->destroy() := hw_perf_event_destroy()
> >                                     x86_release_hardware()
> >                                       release_ds_buffers()
> >                                         get_online_cpus()
> > 
> > 
> >                         kthread_parkme()
> >                           complete(&X)
> > 
> > 
> > 
> > So CPU0 holds cpus_hotplug_lock while wait_for_completion() and CPU1
> > needs to acquire before complete().
> > 
> > So if, in between, CPU2 does down_write(), things will get unstuck.
> > 
> > What's worse, there's also:
> > 
> >   cpus_write_lock()
> >     ...
> >       takedown_cpu()
> >         smpboot_park_threads()
> >       smpboot_park_thread()
> >         kthread_park()
> >           ->park() := watchdog_disable()
> >             watchdog_nmi_disable()
> >               perf_event_release_kernel();
> >                 put_event()
> >                   _free_event()
> >                     ->destroy() := hw_perf_event_destroy()
> >                       x86_release_hardware()
> >                         release_ds_buffers()
> >                           get_online_cpus()
> > 
> > which as far as I can tell, spells instant deadlock..
> 
> Aah, but that latter will never happen.. because each CPU will have a
> &pmc_refcount and we can't unplug _all_ CPUs.
> 
> So the first one will only ever happen on boot, where we park() the very
> first watchdog thread and is a potential deadlock, but won't happen
> because nobody is around to do down_write() just yet.

I suspect however it is possible to interleave:

        sysctl.kernel.nmi_watchdog = 0
          proc_watchdog_common()

with:

        hot un-plug
         watchdog_disable()

to tickle that exact problem. Just needs a bit of luck.


Reply via email to