On 09/20/2018 03:10 PM, Eric Dumazet wrote:
> 
> 
> On 09/20/2018 12:25 PM, stran...@codeaurora.org wrote:
>>>
>>> I do not believe the changelog or the patch makes sense.
>>>
>>> Having skb still referencing a socket prevents this socket being released.
>>>
>>> If you think about it, what would prevent the freeing happening
>>> _before_ the rcu_read_lock() in pfkey_broadcast() ?
>>>
>>> Maybe the correct fix is that pfkey_broadcast_one() should ensure the
>>> socket is still valid.
>>>
>>> I would suggest something like :
>>>
>>> diff --git a/net/key/af_key.c b/net/key/af_key.c
>>> index
>>> 9d61266526e767770d9a1ce184ac8cdd59de309a..5ce309d020dda5e46e4426c4a639bfb551e2260d
>>> 100644
>>> --- a/net/key/af_key.c
>>> +++ b/net/key/af_key.c
>>> @@ -201,7 +201,9 @@ static int pfkey_broadcast_one(struct sk_buff
>>> *skb, struct sk_buff **skb2,
>>>  {
>>>         int err = -ENOBUFS;
>>>
>>> -       sock_hold(sk);
>>> +       if (!refcount_inc_not_zero(&sk->sk_refcnt))
>>> +               return -ENOENT;
>>> +
>>>         if (*skb2 == NULL) {
>>>                 if (refcount_read(&skb->users) != 1) {
>>>                         *skb2 = skb_clone(skb, allocation);
>>
>> Hi Eric,
>>
>> I'm not sure that the socket getting freed before the rcu_read_lock() would
>> be an issue, since then it would no longer be in the net_pkey->table that
>> we loop through (since we call pfkey_remove() from pfkey_relase()). Because 
>> of
>> that, all the sockets processed in pfkey_broadcast_one() have valid 
>> refcounts,
>> so checking for zero there doesn't prevent the crash that I'm seeing.
>>
>> However, after going over the call flow again, I see that the actual problem
>> occurs because of pfkey_broadcast_one(). Specifically, because of this check:
>>
>>     if (*skb2 == NULL) {
>>         if (refcount_read(&skb->users) != 1) {
>>             *skb2 = skb_clone(skb, allocation);
>>         } else {
>>             *skb2 = skb;
>>             refcount_inc(&skb->users);
>>         }
>>     }
>>
>> Since we always pass a freshly cloned SKB to this function, skb->users is
>> always 1, and skb2 just becomes skb. We then set skb2 (and thus skb) to
>> belong to the socket.
>>
>> If the socket we queue skb2 to frees this SKB (thereby decrementing its
>> refcount to 1) and the socket is freed before pfkey_broadcast() can
>> execute the kfree_skb(skb) on line 284, we will then attempt to run
>> sock_rfree() on an SKB with a dangling reference to this socket.
>>
>> Perhaps a cleaner solution here is to always clone the SKB in
>> pfkey_broadcast_one(). That will ensure that the two kfree_skb() calls
>> in pfkey_broadcast() will never be passed an SKB with sock_rfree() as
>> its destructor, and we can avoid this race condition.
> 
> As long as one skb has sock_rfree has its destructor, the socket attached to
> this skb can not be released. There is no race here.
> 
> Note that skb_clone() does not propagate the destructor.
> 
> The issue here is that in the rcu lookup, we can find a socket that has been
> dismantled, with a 0 refcount.
> 
> We must not use sock_hold() in this case, since we are not sure the socket 
> refcount is not already 0.
> 
> pfkey_broadcast() and pfkey_broadcast_one() violate basic RCU rules.
> 
> When in a RCU lookup, one want to increment an object refcount, it needs
> to be extra-careful, as I did in my proposal.
> 
> Note that the race could be automatically detected with CONFIG_REFCOUNT_FULL=y

Bug was added in commit 7f6b9dbd5afb ("af_key: locking change")

Reply via email to