On Mon, Jul 21, 2025 at 10:23:20AM +0200, Gabriele Monaco wrote:
> DA monitor can be accessed from multiple cores simultaneously, this is
> likely, for instance when dealing with per-task monitors reacting on
> events that do not always occur on the CPU where the task is running.
> This can cause race conditions where two events change the next state
> and we see inconsistent values. E.g.:
> 
>   [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
>   [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
>   [63] error_srs: 27: event sched_switch_suspend not expected in the state 
> running
> 
> In this case the monitor fails because the event on CPU 62 wins against
> the one on CPU 63, although the correct state should have been
> sleepable, since the task get suspended.
> 
> Detect if the current state was modified by using try_cmpxchg while
> storing the next value. If it was, try again reading the current state.
> After a maximum number of failed retries, react by calling a special
> tracepoint, print on the console and reset the monitor.
> 
> Remove the functions da_monitor_curr_state() and da_monitor_set_state()
> as they only hide the underlying implementation in this case.
> 
> Monitors where this type of condition can occur must be able to account
> for racing events in any possible order, as we cannot know the winner.
> 
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Gabriele Monaco <[email protected]>
> ---
>
>  static inline bool                                                           
>                 \
>  da_event_##name(struct da_monitor *da_mon, enum events_##name event)         
>                 \
>  {                                                                            
>                 \
> -     type curr_state = da_monitor_curr_state_##name(da_mon);                 
>                 \
> -     type next_state = model_get_next_state_##name(curr_state, event);       
>                 \
> -                                                                             
>                 \
> -     if (next_state != INVALID_STATE) {                                      
>                 \
> -             da_monitor_set_state_##name(da_mon, next_state);                
>                 \
> -                                                                             
>                 \
> -             trace_event_##name(model_get_state_name_##name(curr_state),     
>                 \
> -                                model_get_event_name_##name(event),          
>                 \
> -                                model_get_state_name_##name(next_state),     
>                 \
> -                                model_is_final_state_##name(next_state));    
>                 \
> -                                                                             
>                 \
> -             return true;                                                    
>                 \
> +     enum states_##name curr_state, next_state;                              
>                 \
> +                                                                             
>                 \
> +     curr_state = READ_ONCE(da_mon->curr_state);                             
>                 \
> +     for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {                  
>                 \
> +             next_state = model_get_next_state_##name(curr_state, event);    
>                 \
> +             if (next_state == INVALID_STATE) {                              
>                 \
> +                     cond_react_##name(curr_state, event);                   
>                 \
> +                     
> trace_error_##name(model_get_state_name_##name(curr_state),             \
> +                                        model_get_event_name_##name(event)); 
>                 \
> +                     return false;                                           
>                 \
> +             }                                                               
>                 \
> +             if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, 
> next_state))) {        \
> +                     
> trace_event_##name(model_get_state_name_##name(curr_state),             \
> +                                        model_get_event_name_##name(event),  
>                 \
> +                                        
> model_get_state_name_##name(next_state),             \
> +                                        
> model_is_final_state_##name(next_state));            \
> +                     return true;                                            
>                 \
> +             }                                                               
>                 \
>       }                                                                       
>                 \
>                                                                               
>                 \
> -     cond_react_##name(curr_state, event);                                   
>                 \
> -                                                                             
>                 \
> -     trace_error_##name(model_get_state_name_##name(curr_state),             
>                 \
> -                        model_get_event_name_##name(event));                 
>                 \
> -                                                                             
>                 \
> +     trace_rv_retries_error(#name, smp_processor_id());                      
>                 \
> +     pr_warn("rv: " __stringify(MAX_DA_RETRY_RACING_EVENTS)                  
>                 \
> +             " retries reached, resetting monitor %s", #name);               
>                 \

smp_processor_id() requires preemption to be disabled.

At the moment, trace point handler is called with preemption disabled, so
we are fine. But there is plan to change that:
https://lore.kernel.org/lkml/[email protected]/T/#u

Perhaps use get_cpu() and put_cpu() instead?

Nam

Reply via email to