On Sun, Oct 20, 2019 at 08:46:10AM -0700, Tom Rix wrote:
> On PREEMPT_RT_FULL while running netperf, a corruption
> of the skb queue causes an oops.
> 
> This appears to be caused by a race condition here
>         __skb_queue_tail(&trans->queue, skb);
>         tasklet_schedule(&trans->tasklet);
> Where the queue is changed before the tasklet is locked by
> tasklet_schedule.
> 
> The fix is to use the skb queue lock.
> 
> Signed-off-by: Tom Rix <t...@redhat.com>
> ---
>  net/xfrm/xfrm_input.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 9b599ed66d97..226dead86828 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -758,12 +758,16 @@ static void xfrm_trans_reinject(unsigned long data)
>      struct xfrm_trans_tasklet *trans = (void *)data;
>      struct sk_buff_head queue;
>      struct sk_buff *skb;
> +    unsigned long flags;
> 
>      __skb_queue_head_init(&queue);
> +    spin_lock_irqsave(&trans->queue.lock, flags);
>      skb_queue_splice_init(&trans->queue, &queue);
> 
>      while ((skb = __skb_dequeue(&queue)))
>          XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
> +
> +    spin_unlock_irqrestore(&trans->queue.lock, flags);
>  }
> 
>  int xfrm_trans_queue(struct sk_buff *skb,
> @@ -771,15 +775,20 @@ int xfrm_trans_queue(struct sk_buff *skb,
>                     struct sk_buff *))
>  {
>      struct xfrm_trans_tasklet *trans;
> +    unsigned long flags;
> 
>      trans = this_cpu_ptr(&xfrm_trans_tasklet);
> +    spin_lock_irqsave(&trans->queue.lock, flags);

As you can see above 'trans' is per cpu, so a spinlock
is not needed here. Also this does not run in hard
interrupt context, so irqsave is also not needed.
I don't see how this can fix anything.

Can you please explain that race a bit more detailed?

Reply via email to