Vineet,

On Fri, 29 Mar 2013, Vineet Gupta wrote:

> When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> issue when mod_timer() races with itself. This is on a FPGA board and
> kernel .config among others has !SMP and !PREEMPT_COUNT.
> 
> The issue happens in mod_timer( ) because timer_pending( ) based early
> exit check is NOT done inside the timer base spinlock - as a networking
> optimization.
> 
> The value used in there, timer->entry.next is also used further in call
> chain (all inlines though) for actual list manipulation. However if the
> register containing this pointer remains live across the spinlock (in a
> UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> a stale value of next pointer causes incorrect list manipulation,
> observed with following sequence in our tests.
> 
> (0). tv1[x] <----> t1 <---> t2
> (1). mod_timer(t1) interrupted after it calls timer_pending()
> (2). mod_timer(t2) completes
> (3). mod_timer(t1) resumes but messes up the list.
> (4). __runt_timers( ) uses bogus timer_list entry / crashes in
>      timer->function
> 
> The simplest fix is to NOT rely on spinlock based compiler barrier but
> add an explicit one in timer_pending()

That's simple, but dangerous. There is other code which relies on the
implicit barriers of spinlocks, so I think we need to add the barrier
to the !PREEMPT_COUNT implementation of preempt_*() macros.

Thanks,

        tglx

> FWIW, the relevant ARCompact disassembly of mod_timer which clearly
> shows the issue due to register reuse is:
> 
> mod_timer:
>     push_s blink
>     mov_s r13,r0      # timer, timer
> 
> ...
>     ###### timer_pending( )
>     ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
>     brne r3, 0, @.L163
> 
> .L163:
> ....
>     ###### spin_lock_irq( )
>     lr  r5, [status32]  # flags
>     bic r4, r5, 6       # temp, flags,
>     and.f 0, r5, 6      # flags,
>     flag.nz r4
> 
>     ###### detach_if_pending( ) begins
> 
>     tst_s r3,r3  <--------------
>                       # timer_pending( ) checks timer->entry.next
>                         # r3 is NOT reloaded by gcc, using stale value
>     beq.d @.L169
>     mov.eq r0,0
> 
>     #  detach_timer( ): __list_del( )
> 
>     ld r4,[r13,4]     # <variable>.entry.prev, D.31439
>     st r4,[r3,4]      # <variable>.prev, D.31439
>     st r3,[r4]        # <variable>.next, D.30246
> 
> Signed-off-by: Vineet Gupta <vgu...@synopsys.com>
> Reported-by: Christian Ruppert <christian.rupp...@abilis.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Christian Ruppert <christian.rupp...@abilis.com>
> Cc: Pierrick Hascoet <pierrick.hasc...@abilis.com>
> Cc: linux-kernel@vger.kernel.org
> ---
>  include/linux/timer.h |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..1537104 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct 
> timer_list *timer,
>   */
>  static inline int timer_pending(const struct timer_list * timer)
>  {
> -     return timer->entry.next != NULL;
> +     int pending = timer->entry.next != NULL;
> +
> +     /*
> +      * The check above enables timer fast path - early exit.
> +      * However most of the call sites are not protected by timer->base
> +      * spinlock. If the caller (say mod_timer) races with itself, it
> +      * can use the stale "next" pointer. See commit log for details.
> +      */
> +     barrier();
> +     return pending;
>  }
>  
>  extern void add_timer_on(struct timer_list *timer, int cpu);
> -- 
> 1.7.10.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to