Hi Daniel,

On 16/09/20 09:06, Daniel Bristot de Oliveira wrote:
> stress-ng has a test (stress-ng --cyclic) that creates a set of threads
> under SCHED_DEADLINE with the following parameters:
> 
>     dl_runtime   =  10000 (10 us)
>     dl_deadline  = 100000 (100 us)
>     dl_period    = 100000 (100 us)
> 
> These parameters are very aggressive. When using a system without HRTICK
> set, these threads can easily execute longer than the dl_runtime because
> the throttling happens with 1/HZ resolution.
> 
> During the main part of the test, the system works just fine because
> the workload does not try to run over the 10 us. The problem happens at
> the end of the test, on the exit() path. During exit(), the threads need
> to do some cleanups that require real-time mutex locks, mainly those
> related to memory management, resulting in this scenario:
> 
> Note: locks are rt_mutexes...
>  ------------------------------------------------------------------------
>     TASK A:           TASK B:                         TASK C:
>     activation
>                                                       activation
>                       activation
> 
>     lock(a): OK!      lock(b): OK!
>                       <overrun runtime>
>                       lock(a)
>                       -> block (task A owns it)
>                         -> self notice/set throttled
>  +--<                   -> arm replenished timer
>  |                            switch-out
>  |                                                            lock(b)
>  |                                                            -> <C prio > B 
> prio>
>  |                                                            -> boost TASK B
>  |  unlock(a)                                         switch-out
>  |  -> handle lock a to B
>  |    -> wakeup(B)
>  |      -> B is throttled:
>  |        -> do not enqueue
>  |     switch-out
>  |
>  |
>  +---------------------> replenishment timer
>                       -> TASK B is boosted:
>                         -> do not enqueue
>  ------------------------------------------------------------------------
> 
> BOOM: TASK B is runnable but !enqueued, holding TASK C: the system
> crashes with hung task C.
> 
> This problem is avoided by removing the throttle state from the boosted
> thread while boosting it (by TASK A in the example above), allowing it to
> be queued and run boosted.
> 
> The next replenishment will take care of the runtime overrun, pushing
> the deadline further away. See the "while (dl_se->runtime <= 0)" on
> replenish_dl_entity() for more information.
> 
> Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
> Reported-by: Mark Simmons <[email protected]>
> Reviewed-by: Juri Lelli <[email protected]>
> Tested-by: Mark Simmons <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ben Segall <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Daniel Bristot de Oliveira <[email protected]>
> Cc: [email protected]
> 
> ---

Thanks for this fix.

Acked-by: Juri Lelli <[email protected]>

Best,
Juri

Reply via email to