Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
On (03/20/18 12:37), H??kon Bugge wrote: > > A little nit below. And some spelling issues in existing commentary > you can consider fixing, since you reshuffle this file considerable. > > + if (net != _net && rtn->ctl_table) > > + kfree(rtn->ctl_table); > > Well, this comes from the existing code, but as pointed out by Linus > recently, kfree() handles NULL pointers. So, if rtn->ctl_table is > NULL most likely, the code is OK _if_ you add an unlikely() around the > if-clause. Otherwise, the ??? && rtn->ctl_table??? can simply be removed. As you observe correctly, this comes from the existing code, and is unrelated to the contents of the commit comment. So if we feel passionately about htis, let us please fix this separately, with its own ceremony. > s/when/When/ > s/parameters,this/parameters, this/ > > Well, not part of your commit. As above. > > > > * function resets the RDS connections in that netns so that we can > > Two double spaces incidents above > > Not part of your commit As above. Thanks much. --Sowmini
Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
Hi Sowmini, A little nit below. And some spelling issues in existing commentary you can consider fixing, since you reshuffle this file considerable. Thxs, Håkon > On 18 Mar 2018, at 21:45, Sowmini Varadhan> wrote: > > On (03/18/18 00:55), Kirill Tkhai wrote: >> >> I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is >> another solution to do that, I'm not again that. > > The patch below takes care of this. I've done some preliminary testing, > and I'll send it upstream after doing additional self-review/testing. > Please also take a look, if you can, to see if I missed something. > > Thanks for the input, > > --Sowmini > ---patch follows > > diff --git a/net/rds/tcp.c b/net/rds/tcp.c > index 08ea9cd..87c2643 100644 > --- a/net/rds/tcp.c > +++ b/net/rds/tcp.c > @@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net) > return err; > } > > -static void __net_exit rds_tcp_exit_net(struct net *net) > -{ > - struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); > - > - if (rtn->rds_tcp_sysctl) > - unregister_net_sysctl_table(rtn->rds_tcp_sysctl); > - > - if (net != _net && rtn->ctl_table) > - kfree(rtn->ctl_table); > - > - /* If rds_tcp_exit_net() is called as a result of netns deletion, > - * the rds_tcp_kill_sock() device notifier would already have cleaned > - * up the listen socket, thus there is no work to do in this function. > - * > - * If rds_tcp_exit_net() is called as a result of module unload, > - * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then > - * we do need to clean up the listen socket here. > - */ > - if (rtn->rds_tcp_listen_sock) { > - struct socket *lsock = rtn->rds_tcp_listen_sock; > - > - rtn->rds_tcp_listen_sock = NULL; > - rds_tcp_listen_stop(lsock, >rds_tcp_accept_w); > - } > -} > - > -static struct pernet_operations rds_tcp_net_ops = { > - .init = rds_tcp_init_net, > - .exit = rds_tcp_exit_net, > - .id = _tcp_netid, > - .size = sizeof(struct rds_tcp_net), > - .async = true, > -}; > - > static void rds_tcp_kill_sock(struct net *net) > { > struct rds_tcp_connection *tc, *_tc; > @@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net) > rds_conn_destroy(tc->t_cpath->cp_conn); > } > > -void *rds_tcp_listen_sock_def_readable(struct net *net) > +static void __net_exit rds_tcp_exit_net(struct net *net) > { > struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); > - struct socket *lsock = rtn->rds_tcp_listen_sock; > > - if (!lsock) > - return NULL; > + rds_tcp_kill_sock(net); > > - return lsock->sk->sk_user_data; > + if (rtn->rds_tcp_sysctl) > + unregister_net_sysctl_table(rtn->rds_tcp_sysctl); > + > + if (net != _net && rtn->ctl_table) > + kfree(rtn->ctl_table); Well, this comes from the existing code, but as pointed out by Linus recently, kfree() handles NULL pointers. So, if rtn->ctl_table is NULL most likely, the code is OK _if_ you add an unlikely() around the if-clause. Otherwise, the “ && rtn->ctl_table” can simply be removed. > } > > -static int rds_tcp_dev_event(struct notifier_block *this, > - unsigned long event, void *ptr) > +static struct pernet_operations rds_tcp_net_ops = { > + .init = rds_tcp_init_net, > + .exit = rds_tcp_exit_net, > + .id = _tcp_netid, > + .size = sizeof(struct rds_tcp_net), > + .async = true, > +}; > + > +void *rds_tcp_listen_sock_def_readable(struct net *net) > { > - struct net_device *dev = netdev_notifier_info_to_dev(ptr); > + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); > + struct socket *lsock = rtn->rds_tcp_listen_sock; > > - /* rds-tcp registers as a pernet subys, so the ->exit will only > - * get invoked after network acitivity has quiesced. We need to > - * clean up all sockets to quiesce network activity, and use > - * the unregistration of the per-net loopback device as a trigger > - * to start that cleanup. > - */ > - if (event == NETDEV_UNREGISTER_FINAL && > - dev->ifindex == LOOPBACK_IFINDEX) > - rds_tcp_kill_sock(dev_net(dev)); > + if (!lsock) > + return NULL; > > - return NOTIFY_DONE; > + return lsock->sk->sk_user_data; > } > > -static struct notifier_block rds_tcp_dev_notifier = { > - .notifier_call= rds_tcp_dev_event, > - .priority = -10, /* must be called after other network notifiers */ > -}; > - > /* when sysctl is used to modify some kernel socket parameters,this s/when/When/ s/parameters,this/parameters, this/ Well, not part of your commit. > * function resets the RDS connections in that netns so that we can Two double spaces incidents above Not
Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
Hi, Sowmini, thanks for looking into this. On 18.03.2018 23:45, Sowmini Varadhan wrote: > On (03/18/18 00:55), Kirill Tkhai wrote: >> >> I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is >> another solution to do that, I'm not again that. > > The patch below takes care of this. I've done some preliminary testing, > and I'll send it upstream after doing additional self-review/testing. > Please also take a look, if you can, to see if I missed something. > > Thanks for the input, > > --Sowmini > ---patch follows > > diff --git a/net/rds/tcp.c b/net/rds/tcp.c > index 08ea9cd..87c2643 100644 > --- a/net/rds/tcp.c > +++ b/net/rds/tcp.c > @@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net) > return err; > } > > -static void __net_exit rds_tcp_exit_net(struct net *net) > -{ > - struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); > - > - if (rtn->rds_tcp_sysctl) > - unregister_net_sysctl_table(rtn->rds_tcp_sysctl); > - > - if (net != _net && rtn->ctl_table) > - kfree(rtn->ctl_table); > - > - /* If rds_tcp_exit_net() is called as a result of netns deletion, > - * the rds_tcp_kill_sock() device notifier would already have cleaned > - * up the listen socket, thus there is no work to do in this function. > - * > - * If rds_tcp_exit_net() is called as a result of module unload, > - * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then > - * we do need to clean up the listen socket here. > - */ > - if (rtn->rds_tcp_listen_sock) { > - struct socket *lsock = rtn->rds_tcp_listen_sock; > - > - rtn->rds_tcp_listen_sock = NULL; > - rds_tcp_listen_stop(lsock, >rds_tcp_accept_w); > - } > -} > - > -static struct pernet_operations rds_tcp_net_ops = { > - .init = rds_tcp_init_net, > - .exit = rds_tcp_exit_net, > - .id = _tcp_netid, > - .size = sizeof(struct rds_tcp_net), > - .async = true, > -}; > - > static void rds_tcp_kill_sock(struct net *net) > { > struct rds_tcp_connection *tc, *_tc; > @@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net) > rds_conn_destroy(tc->t_cpath->cp_conn); > } > > -void *rds_tcp_listen_sock_def_readable(struct net *net) > +static void __net_exit rds_tcp_exit_net(struct net *net) > { > struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); > - struct socket *lsock = rtn->rds_tcp_listen_sock; > > - if (!lsock) > - return NULL; > + rds_tcp_kill_sock(net); rds_tcp_listen_sock destruction looks nice and safe, since all the places the sockets is dereferenced use sk_callback_lock. So they don't miss rds_tcp_listen_sock = NULL, as rds_tcp_listen_stop() takes the lock too. rds_tcp_conn_list is populated from: 1)rds_tcp_accept_one(), which can't happen after we flushed the queue in rds_tcp_listen_stop(); 2)rds_sendmsg(), which is triggered by userspace, and that's impossible, when net is dead; 3)rds_ib_cm_handle_connect(), which call rds_conn_create() with init_net argument only. This may race with module unloading only, but this problem is already solved in RDS by rds_destroy_pending() check, which care about that: static bool rds_tcp_is_unloading(struct rds_connection *conn) { return atomic_read(_tcp_unloading) != 0; } static void rds_tcp_exit(void) { rds_tcp_set_unloading(); synchronize_rcu(); ... } static struct rds_connection * __rds_conn_create(...) { ... rcu_read_lock(); if (rds_destroy_pending(conn)) ret = -ENETDOWN; else ret = trans->conn_alloc(conn, GFP_ATOMIC); ... } So, everything looks good for me. Kirill > > - return lsock->sk->sk_user_data; > + if (rtn->rds_tcp_sysctl) > + unregister_net_sysctl_table(rtn->rds_tcp_sysctl); > + > + if (net != _net && rtn->ctl_table) > + kfree(rtn->ctl_table); > } > > -static int rds_tcp_dev_event(struct notifier_block *this, > - unsigned long event, void *ptr) > +static struct pernet_operations rds_tcp_net_ops = { > + .init = rds_tcp_init_net, > + .exit = rds_tcp_exit_net, > + .id = _tcp_netid, > + .size = sizeof(struct rds_tcp_net), > + .async = true, > +}; > + > +void *rds_tcp_listen_sock_def_readable(struct net *net) > { > - struct net_device *dev = netdev_notifier_info_to_dev(ptr); > + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); > + struct socket *lsock = rtn->rds_tcp_listen_sock; > > - /* rds-tcp registers as a pernet subys, so the ->exit will only > - * get invoked after network acitivity has quiesced. We need to > - * clean up all sockets to quiesce network activity, and use > - * the unregistration of the per-net loopback device as a trigger > - * to
Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
On (03/18/18 00:55), Kirill Tkhai wrote: > > I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is > another solution to do that, I'm not again that. The patch below takes care of this. I've done some preliminary testing, and I'll send it upstream after doing additional self-review/testing. Please also take a look, if you can, to see if I missed something. Thanks for the input, --Sowmini ---patch follows diff --git a/net/rds/tcp.c b/net/rds/tcp.c index 08ea9cd..87c2643 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net) return err; } -static void __net_exit rds_tcp_exit_net(struct net *net) -{ - struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); - - if (rtn->rds_tcp_sysctl) - unregister_net_sysctl_table(rtn->rds_tcp_sysctl); - - if (net != _net && rtn->ctl_table) - kfree(rtn->ctl_table); - - /* If rds_tcp_exit_net() is called as a result of netns deletion, -* the rds_tcp_kill_sock() device notifier would already have cleaned -* up the listen socket, thus there is no work to do in this function. -* -* If rds_tcp_exit_net() is called as a result of module unload, -* i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then -* we do need to clean up the listen socket here. -*/ - if (rtn->rds_tcp_listen_sock) { - struct socket *lsock = rtn->rds_tcp_listen_sock; - - rtn->rds_tcp_listen_sock = NULL; - rds_tcp_listen_stop(lsock, >rds_tcp_accept_w); - } -} - -static struct pernet_operations rds_tcp_net_ops = { - .init = rds_tcp_init_net, - .exit = rds_tcp_exit_net, - .id = _tcp_netid, - .size = sizeof(struct rds_tcp_net), - .async = true, -}; - static void rds_tcp_kill_sock(struct net *net) { struct rds_tcp_connection *tc, *_tc; @@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net) rds_conn_destroy(tc->t_cpath->cp_conn); } -void *rds_tcp_listen_sock_def_readable(struct net *net) +static void __net_exit rds_tcp_exit_net(struct net *net) { struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); - struct socket *lsock = rtn->rds_tcp_listen_sock; - if (!lsock) - return NULL; + rds_tcp_kill_sock(net); - return lsock->sk->sk_user_data; + if (rtn->rds_tcp_sysctl) + unregister_net_sysctl_table(rtn->rds_tcp_sysctl); + + if (net != _net && rtn->ctl_table) + kfree(rtn->ctl_table); } -static int rds_tcp_dev_event(struct notifier_block *this, -unsigned long event, void *ptr) +static struct pernet_operations rds_tcp_net_ops = { + .init = rds_tcp_init_net, + .exit = rds_tcp_exit_net, + .id = _tcp_netid, + .size = sizeof(struct rds_tcp_net), + .async = true, +}; + +void *rds_tcp_listen_sock_def_readable(struct net *net) { - struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + struct socket *lsock = rtn->rds_tcp_listen_sock; - /* rds-tcp registers as a pernet subys, so the ->exit will only -* get invoked after network acitivity has quiesced. We need to -* clean up all sockets to quiesce network activity, and use -* the unregistration of the per-net loopback device as a trigger -* to start that cleanup. -*/ - if (event == NETDEV_UNREGISTER_FINAL && - dev->ifindex == LOOPBACK_IFINDEX) - rds_tcp_kill_sock(dev_net(dev)); + if (!lsock) + return NULL; - return NOTIFY_DONE; + return lsock->sk->sk_user_data; } -static struct notifier_block rds_tcp_dev_notifier = { - .notifier_call= rds_tcp_dev_event, - .priority = -10, /* must be called after other network notifiers */ -}; - /* when sysctl is used to modify some kernel socket parameters,this * function resets the RDS connections in that netns so that we can * restart with new parameters. The assumption is that such reset @@ -625,9 +589,7 @@ static void rds_tcp_exit(void) rds_tcp_set_unloading(); synchronize_rcu(); rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info); - unregister_pernet_subsys(_tcp_net_ops); - if (unregister_netdevice_notifier(_tcp_dev_notifier)) - pr_warn("could not unregister rds_tcp_dev_notifier\n"); + unregister_pernet_device(_tcp_net_ops); rds_tcp_destroy_conns(); rds_trans_unregister(_tcp_transport); rds_tcp_recv_exit(); @@ -651,24 +613,17 @@ static int rds_tcp_init(void) if (ret) goto out_slab; - ret = register_pernet_subsys(_tcp_net_ops); + ret =
Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
On 18.03.2018 00:26, Sowmini Varadhan wrote: > On (03/17/18 10:15), Sowmini Varadhan wrote: >> To solve the scaling problem why not just have a well-defined >> callback to modules when devices are quiesced, instead of >> overloading the pernet_device registration in this obscure way? > > I thought about this a bit, and maybe I missed your original point- > today we are able to do all the needed cleanup for rds-tcp when > we unload the module, even though network activity has not quiesced, > and there is no reason we cannot use the same code for netns cleanup > as well. I think this is what you were trying to ask, when you > said "why do you need to know that loopback is down?" I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is another solution to do that, I'm not again that. > I'm sorry I missed that, I will re-examine the code and get back to > you- it should be possible to just do one registration and > cleanup rds-state and avoid the hack of registering twice Sounds great, I'll wait for your response. > (saw your most recent long mail- sorry- both v1 and v2 are hacks) > > I'm on the road at the moment, so I'll get back to you on this. Thanks, Kirill
Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
On (03/17/18 10:15), Sowmini Varadhan wrote: > To solve the scaling problem why not just have a well-defined > callback to modules when devices are quiesced, instead of > overloading the pernet_device registration in this obscure way? I thought about this a bit, and maybe I missed your original point- today we are able to do all the needed cleanup for rds-tcp when we unload the module, even though network activity has not quiesced, and there is no reason we cannot use the same code for netns cleanup as well. I think this is what you were trying to ask, when you said "why do you need to know that loopback is down?" I'm sorry I missed that, I will re-examine the code and get back to you- it should be possible to just do one registration and cleanup rds-state and avoid the hack of registering twice (saw your most recent long mail- sorry- both v1 and v2 are hacks) I'm on the road at the moment, so I'll get back to you on this. Thanks --Sowmini