Kirill Tkhai <ktk...@virtuozzo.com> writes:

> peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count)
> under net->nsid_lock does not guarantee, peer is alive:
>
> rcu_read_lock()
> peernet2id_alloc()                            ..
>   spin_lock_bh(&net->nsid_lock)               ..
>   atomic_read(&peer->count) == 1              ..
>   ..                                          put_net()
>   ..                                            cleanup_net()
>   ..                                              for_each_net(tmp)
>   ..                                                
> spin_lock_bh(&tmp->nsid_lock)
>   ..                                                __peernet2id(tmp, net) == 
> -1
>   ..                                                    ..
>   ..                                                    ..
>     __peernet2id_alloc(alloc == true)                   ..
>   ..                                                    ..
> rcu_read_unlock()                                       ..
> ..                                                synchronize_rcu()
> ..                                                kmem_cache_free(net)
>
> After the above situation, net::netns_id contains id pointing to freed memory,
> and any other dereferencing by the id will operate with this freed memory.
>
> Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
> ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
> is generic interface, and better we fix it before someone really starts
> use it in wrong context.

So it comes down to this piece of code from ovs and just let me say ick.
        if (!net_eq(net, dev_net(vport->dev))) {
                int id = peernet2id_alloc(net, dev_net(vport->dev));

                if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id))
                        goto nla_put_failure;
        }

Without the rtnl lock dev_net can cange between the test and the
call of peernet2id_alloc.

At first glance it looks like the bug is that we are running a control
path of the networking stack without the rtnl lock. So it may be that
ASSERT_RTNL() is the better fix.

Given that it would be nice to reduce the scope of the rtnl lock this
might not be a bad direction.  Let me see.

Is rtnl_notify safe without the rtnl lock?


>
> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>
> ---
>  net/core/net_namespace.c |   23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 60a71be75aea..6a4eab438221 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, 
> int id);
>   */
>  int peernet2id_alloc(struct net *net, struct net *peer)
>  {
> -     bool alloc;
> +     bool alloc = false, alive = false;
>       int id;

        ^^^ Perhaps we want "ASSERT_RTNL();" here?
>  
> -     if (atomic_read(&net->count) == 0)
> -             return NETNSA_NSID_NOT_ASSIGNED;

Moving this hunk is of no benefit.  The code must be called with a valid
reference to net.   Which means net->count is a fancy way of testing to
see if the code is in cleanup_net.  In all other cases net->count should
be non-zero and it should remain that way because of our caller must
keep a reference.

>       spin_lock_bh(&net->nsid_lock);
> -     alloc = atomic_read(&peer->count) == 0 ? false : true;
> +     /* Spinlock guarantees we never hash a peer to net->netns_ids
> +      * after idr_destroy(&net->netns_ids) occurs in cleanup_net().
> +      */
> +     if (atomic_read(&net->count) == 0) {
> +             id = NETNSA_NSID_NOT_ASSIGNED;
> +             goto unlock;
> +     }
> +     /*
> +      * When peer is obtained from RCU lists, we may race with
> +      * its cleanup. Check whether it's alive, and this guarantees
> +      * we never hash a peer back to net->netns_ids, after it has
> +      * just been idr_remove()'d from there in cleanup_net().
> +      */
> +     if (maybe_get_net(peer))
> +             alive = alloc = true;

Yes this does seem reasonable.  The more obvious looking code which
would return NETNSA_NSID_NOT_ASSIGNED if the peer has a count of 0, is
silly as it makes would make it appear that a peer is momentary outside
of a network namespace when the peer is in fact moving from one network
namespace to another.
        
>       id = __peernet2id_alloc(net, peer, &alloc);
> +unlock:
>       spin_unlock_bh(&net->nsid_lock);
>       if (alloc && id >= 0)
>               rtnl_net_notifyid(net, RTM_NEWNSID, id);
                ^^^^^^
                Is this safe without the rtnl lock?
> +     if (alive)
> +             put_net(peer);
>       return id;
>  }
>  EXPORT_SYMBOL_GPL(peernet2id_alloc);

Eric

Reply via email to