Hi Krzysztof, > The i915_active selftests live_active_wait and live_active_retire > operate on an i915_active attached to a mock, empty request, created as > part of test setup. A fence is attached to this request to control when > the request is processed. The tests then wait for the completion of the > active with __i915_active_wait(), and the test is considered successful > if this results in setting a variable in the active callback. > > However, the behavior of __i915_active_wait() is such that if the > refcount for the active is 0, the function is almost completely skipped; > waiting on a already completed active yields no effect. This includes a > subsequent call to the retire() function of the active (which is the > callback that the test is interested about, and which dictates whether > its successful or not). So, if the active is completed before the > aforementioned call to __i915_active_wait(), the test will fail. > > Most of the test runs in a single thread, including creating the > request, creating the fence for it, signalling that fence, and calling > __i915_active_wait(). However, the request itself is handled > asynchronously. This creates a race condition where if the request is > completed after signalling the fence, but before waiting on its active, > the active callback will not be invoked, failing the test. > > Defer signalling the request's fence, to ensure the main test thread > gets to call __i915_active_wait() before request completion. > > v3: > - Embed the variables inside the live_active struct (Andi) > - Move the schedule_delayed_work call closer to the wait (Andi) > - Implement error handling in case an error state - the wait has > finished, but the deferred work didn't run - is somehow achieved (Andi) > > v2: > - Clarify the need for a fix a little more (Krzysztof K., Janusz) > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14808 > Signed-off-by: Krzysztof Niemiec <[email protected]> > --- The change looks good to me. I also agree with suggestions from Jonathan that he wrote in his review - since you increased the common code between live_active_retire() and live_active_wait(), you could put some parts into helper functions in a future patch.
However, current version should be enough for now to get the issue fixed: Reviewed-by: Krzysztof Karas <[email protected]> -- Best Regards, Krzysztof
