On Fri, Oct 06, 2017 at 06:37:52PM +0100, Chris Wilson wrote: > Quoting Daniel Vetter (2017-10-06 15:20:09) > > On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote: > > > Quoting Daniel Vetter (2017-10-06 10:06:37) > > > > stop_machine is not really a locking primitive we should use, except > > > > when the hw folks tell us the hw is broken and that's the only way to > > > > work around it. > > > > > > > > This patch tries to address the locking abuse of stop_machine() from > > > > > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > > > Author: Chris Wilson <ch...@chris-wilson.co.uk> > > > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > > > > > drm/i915: Stop the machine as we install the wedged submit_request > > > > handler > > > > > > > > Chris said parts of the reasons for going with stop_machine() was that > > > > it's no overhead for the fast-path. But these callbacks use irqsave > > > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > > > > > To stay as close as possible to the stop_machine semantics we first > > > > update all the submit function pointers to the nop handler, then call > > > > synchronize_rcu() to make sure no new requests can be submitted. This > > > > should give us exactly the huge barrier we want. > > > > > > > > I pondered whether we should annotate engine->submit_request as __rcu > > > > and use rcu_assign_pointer and rcu_dereference on it. But the reason > > > > behind those is to make sure the compiler/cpu barriers are there for > > > > when you have an actual data structure you point at, to make sure all > > > > the writes are seen correctly on the read side. But we just have a > > > > function pointer, and .text isn't changed, so no need for these > > > > barriers and hence no need for annotations. > > > > > > > > This should fix the followwing lockdep splat: > > > > > > > > ====================================================== > > > > WARNING: possible circular locking dependency detected > > > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U > > > > ------------------------------------------------------ > > > > kworker/3:4/562 is trying to acquire lock: > > > > (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] > > > > stop_machine+0x1c/0x40 > > > > > > > > but task is already holding lock: > > > > (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] > > > > i915_reset_device+0x1e8/0x260 [i915] > > > > > > > > which lock already depends on the new lock. > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > > > -> #6 (&dev->struct_mutex){+.+.}: > > > > __lock_acquire+0x1420/0x15e0 > > > > lock_acquire+0xb0/0x200 > > > > __mutex_lock+0x86/0x9b0 > > > > mutex_lock_interruptible_nested+0x1b/0x20 > > > > i915_mutex_lock_interruptible+0x51/0x130 [i915] > > > > i915_gem_fault+0x209/0x650 [i915] > > > > __do_fault+0x1e/0x80 > > > > __handle_mm_fault+0xa08/0xed0 > > > > handle_mm_fault+0x156/0x300 > > > > __do_page_fault+0x2c5/0x570 > > > > do_page_fault+0x28/0x250 > > > > page_fault+0x22/0x30 > > > > > > > > -> #5 (&mm->mmap_sem){++++}: > > > > __lock_acquire+0x1420/0x15e0 > > > > lock_acquire+0xb0/0x200 > > > > __might_fault+0x68/0x90 > > > > _copy_to_user+0x23/0x70 > > > > filldir+0xa5/0x120 > > > > dcache_readdir+0xf9/0x170 > > > > iterate_dir+0x69/0x1a0 > > > > SyS_getdents+0xa5/0x140 > > > > entry_SYSCALL_64_fastpath+0x1c/0xb1 > > > > > > > > -> #4 (&sb->s_type->i_mutex_key#5){++++}: > > > > down_write+0x3b/0x70 > > > > handle_create+0xcb/0x1e0 > > > > devtmpfsd+0x139/0x180 > > > > kthread+0x152/0x190 > > > > ret_from_fork+0x27/0x40 > > > > > > > > -> #3 ((complete)&req.done){+.+.}: > > > > __lock_acquire+0x1420/0x15e0 > > > > lock_acquire+0xb0/0x200 > > > > wait_for_common+0x58/0x210 > > > > wait_for_completion+0x1d/0x20 > > > > devtmpfs_create_node+0x13d/0x160 > > > > device_add+0x5eb/0x620 > > > > device_create_groups_vargs+0xe0/0xf0 > > > > device_create+0x3a/0x40 > > > > msr_device_create+0x2b/0x40 > > > > cpuhp_invoke_callback+0xc9/0xbf0 > > > > cpuhp_thread_fun+0x17b/0x240 > > > > smpboot_thread_fn+0x18a/0x280 > > > > kthread+0x152/0x190 > > > > ret_from_fork+0x27/0x40 > > > > > > > > -> #2 (cpuhp_state-up){+.+.}: > > > > __lock_acquire+0x1420/0x15e0 > > > > lock_acquire+0xb0/0x200 > > > > cpuhp_issue_call+0x133/0x1c0 > > > > __cpuhp_setup_state_cpuslocked+0x139/0x2a0 > > > > __cpuhp_setup_state+0x46/0x60 > > > > page_writeback_init+0x43/0x67 > > > > pagecache_init+0x3d/0x42 > > > > start_kernel+0x3a8/0x3fc > > > > x86_64_start_reservations+0x2a/0x2c > > > > x86_64_start_kernel+0x6d/0x70 > > > > verify_cpu+0x0/0xfb > > > > > > > > -> #1 (cpuhp_state_mutex){+.+.}: > > > > __lock_acquire+0x1420/0x15e0 > > > > lock_acquire+0xb0/0x200 > > > > __mutex_lock+0x86/0x9b0 > > > > mutex_lock_nested+0x1b/0x20 > > > > __cpuhp_setup_state_cpuslocked+0x53/0x2a0 > > > > __cpuhp_setup_state+0x46/0x60 > > > > page_alloc_init+0x28/0x30 > > > > start_kernel+0x145/0x3fc > > > > x86_64_start_reservations+0x2a/0x2c > > > > x86_64_start_kernel+0x6d/0x70 > > > > verify_cpu+0x0/0xfb > > > > > > > > -> #0 (cpu_hotplug_lock.rw_sem){++++}: > > > > check_prev_add+0x430/0x840 > > > > __lock_acquire+0x1420/0x15e0 > > > > lock_acquire+0xb0/0x200 > > > > cpus_read_lock+0x3d/0xb0 > > > > stop_machine+0x1c/0x40 > > > > i915_gem_set_wedged+0x1a/0x20 [i915] > > > > i915_reset+0xb9/0x230 [i915] > > > > i915_reset_device+0x1f6/0x260 [i915] > > > > i915_handle_error+0x2d8/0x430 [i915] > > > > hangcheck_declare_hang+0xd3/0xf0 [i915] > > > > i915_hangcheck_elapsed+0x262/0x2d0 [i915] > > > > process_one_work+0x233/0x660 > > > > worker_thread+0x4e/0x3b0 > > > > kthread+0x152/0x190 > > > > ret_from_fork+0x27/0x40 > > > > > > > > other info that might help us debug this: > > > > > > > > Chain exists of: > > > > cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex > > > > > > > > Possible unsafe locking scenario: > > > > > > > > CPU0 CPU1 > > > > ---- ---- > > > > lock(&dev->struct_mutex); > > > > lock(&mm->mmap_sem); > > > > lock(&dev->struct_mutex); > > > > lock(cpu_hotplug_lock.rw_sem); > > > > > > > > *** DEADLOCK *** > > > > > > > > 3 locks held by kworker/3:4/562: > > > > #0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] > > > > process_one_work+0x1aa/0x660 > > > > #1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: > > > > [<ffffffff8109c64a>] process_one_work+0x1aa/0x660 > > > > #2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] > > > > i915_reset_device+0x1e8/0x260 [i915] > > > > > > > > stack backtrace: > > > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U > > > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 > > > > Hardware name: /NUC7i5BNB, BIOS > > > > BNKBL357.86A.0048.2017.0704.1415 07/04/2017 > > > > Workqueue: events_long i915_hangcheck_elapsed [i915] > > > > Call Trace: > > > > dump_stack+0x68/0x9f > > > > print_circular_bug+0x235/0x3c0 > > > > ? lockdep_init_map_crosslock+0x20/0x20 > > > > check_prev_add+0x430/0x840 > > > > ? irq_work_queue+0x86/0xe0 > > > > ? wake_up_klogd+0x53/0x70 > > > > __lock_acquire+0x1420/0x15e0 > > > > ? __lock_acquire+0x1420/0x15e0 > > > > ? lockdep_init_map_crosslock+0x20/0x20 > > > > lock_acquire+0xb0/0x200 > > > > ? stop_machine+0x1c/0x40 > > > > ? i915_gem_object_truncate+0x50/0x50 [i915] > > > > cpus_read_lock+0x3d/0xb0 > > > > ? stop_machine+0x1c/0x40 > > > > stop_machine+0x1c/0x40 > > > > i915_gem_set_wedged+0x1a/0x20 [i915] > > > > i915_reset+0xb9/0x230 [i915] > > > > i915_reset_device+0x1f6/0x260 [i915] > > > > ? gen8_gt_irq_ack+0x170/0x170 [i915] > > > > ? work_on_cpu_safe+0x60/0x60 > > > > i915_handle_error+0x2d8/0x430 [i915] > > > > ? vsnprintf+0xd1/0x4b0 > > > > ? scnprintf+0x3a/0x70 > > > > hangcheck_declare_hang+0xd3/0xf0 [i915] > > > > ? intel_runtime_pm_put+0x56/0xa0 [i915] > > > > i915_hangcheck_elapsed+0x262/0x2d0 [i915] > > > > process_one_work+0x233/0x660 > > > > worker_thread+0x4e/0x3b0 > > > > kthread+0x152/0x190 > > > > ? process_one_work+0x660/0x660 > > > > ? kthread_create_on_node+0x40/0x40 > > > > ret_from_fork+0x27/0x40 > > > > Setting dangerous option reset - tainting kernel > > > > i915 0000:00:02.0: Resetting chip after gpu hang > > > > Setting dangerous option reset - tainting kernel > > > > i915 0000:00:02.0: Resetting chip after gpu hang > > > > > > > > v2: Have 1 global synchronize_rcu() barrier across all engines, and > > > > improve commit message. > > > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886 > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096 > > > > Cc: Chris Wilson <ch...@chris-wilson.co.uk> > > > > Cc: Mika Kuoppala <mika.kuopp...@intel.com> > > > > Cc: Thomas Gleixner <t...@linutronix.de> > > > > Cc: Marta Lofstedt <marta.lofst...@intel.com> > > > > Signed-off-by: Daniel Vetter <daniel.vet...@intel.com> > > > > --- > > > > drivers/gpu/drm/i915/i915_gem.c | 31 > > > > +++++++++-------------- > > > > drivers/gpu/drm/i915/i915_gem_request.c | 2 ++ > > > > drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 ++ > > > > 3 files changed, 16 insertions(+), 19 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c > > > > b/drivers/gpu/drm/i915/i915_gem.c > > > > index ab8c6946fea4..e79a6ca60265 100644 > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct > > > > drm_i915_gem_request *request) > > > > intel_engine_init_global_seqno(request->engine, > > > > request->global_seqno); > > > > } > > > > > > > > -static void engine_set_wedged(struct intel_engine_cs *engine) > > > > +static void engine_complete_requests(struct intel_engine_cs *engine) > > > > { > > > > - /* We need to be sure that no thread is running the old > > > > callback as > > > > - * we install the nop handler (otherwise we would submit a > > > > request > > > > - * to hardware that will never complete). In order to prevent > > > > this > > > > - * race, we wait until the machine is idle before making the > > > > swap > > > > - * (using stop_machine()). > > > > - */ > > > > - engine->submit_request = nop_submit_request; > > > > - > > > > /* Mark all executing requests as skipped */ > > > > engine->cancel_requests(engine); > > > > > > How are we planning to serialise the intel_engine_init_global_seqno() > > > here with the in-flight nop_submit? With sufficient thrust we will get a > > > stale breadcrumb and an incomplete request. > > > > Yeah that part looks indeed fishy. Well the entire "let the nop handler > > fake-complete requests" logic is something I don't really understand. I > > guess there's an exclusive relationship between requests handled directly > > (and cancelled in engine->cancel_request) and requests with external > > dma_fence dependencies. > > > > But then I'm not really seeing what I'm changing, since even with the stop > > machine you might end up with a bunch of requests depending upon external > > fences, which then all complete at roughly the same time and race multiple > > calls to intel_engine_init_global_seqno with each another. > > Ugh, there's another issue. If nop_submit_request is executed before > cancel_requests, we will consider the execution queue as completed and > not in error, i.e. we will not flag those requests with the user visible > -EIO.
Hm, that one's a bit a mess. What about the following sequence: 1. We set all submit_request functions to a nop request which does _not_ update the global seqno. 2. synchronize_rcu() 3. ->cancel_requests. 4. We set all submit_request to the current nop submit, i.e. including pushing the global seqno forward to complete them all. Maybe rename that one to nop_submit_complete_requests or something like that. 5. synchronize_rcu() 6. Call the spin-locked global seqno init that's currently in complete requests. 1/3/4/6 would all be wrapped in for_each_engine ofc. I think that would also clarify a bit why we have to move the global_seqno in the nop_submit_request function. Definitely needs lots of comments to explain what's going on. Thoughts? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch