Quoting Chris Wilson (2017-10-10 10:21:45) > Quoting Daniel Vetter (2017-10-09 17:44:01) > > stop_machine is not really a locking primitive we should use, except > > when the hw folks tell us the hw is broken and that's the only way to > > work around it. > > > > This patch tries to address the locking abuse of stop_machine() from > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > Author: Chris Wilson <ch...@chris-wilson.co.uk> > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > drm/i915: Stop the machine as we install the wedged submit_request > > handler > > > > Chris said parts of the reasons for going with stop_machine() was that > > it's no overhead for the fast-path. But these callbacks use irqsave > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > To stay as close as possible to the stop_machine semantics we first > > update all the submit function pointers to the nop handler, then call > > synchronize_rcu() to make sure no new requests can be submitted. This > > should give us exactly the huge barrier we want. > > > > I pondered whether we should annotate engine->submit_request as __rcu > > and use rcu_assign_pointer and rcu_dereference on it. But the reason > > behind those is to make sure the compiler/cpu barriers are there for > > when you have an actual data structure you point at, to make sure all > > the writes are seen correctly on the read side. But we just have a > > function pointer, and .text isn't changed, so no need for these > > barriers and hence no need for annotations. > > > > Unfortunately there's a complication with the call to > > intel_engine_init_global_seqno: > > This is still broken in the same way as nop_submit_request may execute > while you sleep, breaking cancel_requests.
My apologies, I misread the diff and didn't catch the removal of init_seqno. From that pov, the execlists should be intact and I can't see a way for a lack of -EIO being reported. (That we have them is a topic for another day, as set-wedged is not meant to be regarded as a normal operation, but user data loss.) The residual panic I have is that we had to throw set-wedged vs unwedge ordering out of the window; it used to be struct_mutex. Making set-wedged prolonged only makes the window wider, it didn't create that window, and at the moment it's carefully ordered inside the i915_handle_error fallback. (We blissfully ignore debugfs.) Reviewed-by: Chris Wilson <ch...@chris-wilson.co.uk> -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx