Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-20 Thread Sowmini Varadhan
On (03/20/18 12:37), H??kon Bugge wrote:
> 
> A little nit below. And some spelling issues in existing commentary
> you can consider fixing, since you reshuffle this file considerable.
> > +   if (net != _net && rtn->ctl_table)
> > +   kfree(rtn->ctl_table);
> 
> Well, this comes from the existing code, but as pointed out by Linus
> recently, kfree() handles NULL pointers. So, if rtn->ctl_table is
> NULL most likely, the code is OK _if_ you add an unlikely() around the
> if-clause. Otherwise, the ??? && rtn->ctl_table??? can simply be removed.

As you observe correctly, this comes from the existing code,
and is unrelated to the contents of the commit comment.

So if we feel passionately about htis, let us please fix this
separately, with its own ceremony.


> s/when/When/
> s/parameters,this/parameters, this/
> 
> Well, not part of your commit.

As above.
> 
> 
> >  * function  resets the RDS connections in that netns  so that we can
> 
> Two double spaces incidents above
> 
> Not part of your commit

As above.

Thanks much.
--Sowmini


Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-20 Thread Håkon Bugge
Hi Sowmini,

A little nit below. And some spelling issues in existing commentary you can 
consider fixing, since you reshuffle this file considerable.


Thxs, Håkon


> On 18 Mar 2018, at 21:45, Sowmini Varadhan  
> wrote:
> 
> On (03/18/18 00:55), Kirill Tkhai wrote:
>> 
>> I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is
>> another solution to do that, I'm not again that.
> 
> The patch below takes care of this. I've done some preliminary testing,
> and I'll send it upstream after doing additional self-review/testing.
> Please also take a look, if you can, to see if I missed something.
> 
> Thanks for the input,
> 
> --Sowmini
> ---patch follows
> 
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 08ea9cd..87c2643 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net)
>   return err;
> }
> 
> -static void __net_exit rds_tcp_exit_net(struct net *net)
> -{
> - struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> -
> - if (rtn->rds_tcp_sysctl)
> - unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
> -
> - if (net != _net && rtn->ctl_table)
> - kfree(rtn->ctl_table);
> -
> - /* If rds_tcp_exit_net() is called as a result of netns deletion,
> -  * the rds_tcp_kill_sock() device notifier would already have cleaned
> -  * up the listen socket, thus there is no work to do in this function.
> -  *
> -  * If rds_tcp_exit_net() is called as a result of module unload,
> -  * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
> -  * we do need to clean up the listen socket here.
> -  */
> - if (rtn->rds_tcp_listen_sock) {
> - struct socket *lsock = rtn->rds_tcp_listen_sock;
> -
> - rtn->rds_tcp_listen_sock = NULL;
> - rds_tcp_listen_stop(lsock, >rds_tcp_accept_w);
> - }
> -}
> -
> -static struct pernet_operations rds_tcp_net_ops = {
> - .init = rds_tcp_init_net,
> - .exit = rds_tcp_exit_net,
> - .id = _tcp_netid,
> - .size = sizeof(struct rds_tcp_net),
> - .async = true,
> -};
> -
> static void rds_tcp_kill_sock(struct net *net)
> {
>   struct rds_tcp_connection *tc, *_tc;
> @@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net)
>   rds_conn_destroy(tc->t_cpath->cp_conn);
> }
> 
> -void *rds_tcp_listen_sock_def_readable(struct net *net)
> +static void __net_exit rds_tcp_exit_net(struct net *net)
> {
>   struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> - struct socket *lsock = rtn->rds_tcp_listen_sock;
> 
> - if (!lsock)
> - return NULL;
> + rds_tcp_kill_sock(net);
> 
> - return lsock->sk->sk_user_data;
> + if (rtn->rds_tcp_sysctl)
> + unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
> +
> + if (net != _net && rtn->ctl_table)
> + kfree(rtn->ctl_table);

Well, this comes from the existing code, but as pointed out by Linus recently, 
kfree() handles NULL pointers. So, if rtn->ctl_table is NULL most likely, the 
code is OK _if_ you add an unlikely() around the if-clause. Otherwise, the “ && 
rtn->ctl_table” can simply be removed.

> }
> 
> -static int rds_tcp_dev_event(struct notifier_block *this,
> -  unsigned long event, void *ptr)
> +static struct pernet_operations rds_tcp_net_ops = {
> + .init = rds_tcp_init_net,
> + .exit = rds_tcp_exit_net,
> + .id = _tcp_netid,
> + .size = sizeof(struct rds_tcp_net),
> + .async = true,
> +};
> +
> +void *rds_tcp_listen_sock_def_readable(struct net *net)
> {
> - struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> + struct socket *lsock = rtn->rds_tcp_listen_sock;
> 
> - /* rds-tcp registers as a pernet subys, so the ->exit will only
> -  * get invoked after network acitivity has quiesced. We need to
> -  * clean up all sockets  to quiesce network activity, and use
> -  * the unregistration of the per-net loopback device as a trigger
> -  * to start that cleanup.
> -  */
> - if (event == NETDEV_UNREGISTER_FINAL &&
> - dev->ifindex == LOOPBACK_IFINDEX)
> - rds_tcp_kill_sock(dev_net(dev));
> + if (!lsock)
> + return NULL;
> 
> - return NOTIFY_DONE;
> + return lsock->sk->sk_user_data;
> }
> 
> -static struct notifier_block rds_tcp_dev_notifier = {
> - .notifier_call= rds_tcp_dev_event,
> - .priority = -10, /* must be called after other network notifiers */
> -};
> -
> /* when sysctl is used to modify some kernel socket parameters,this

s/when/When/
s/parameters,this/parameters, this/

Well, not part of your commit.


>  * function  resets the RDS connections in that netns  so that we can

Two double spaces incidents above

Not 

Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-19 Thread Kirill Tkhai
Hi, Sowmini,

thanks for looking into this.

On 18.03.2018 23:45, Sowmini Varadhan wrote:
> On (03/18/18 00:55), Kirill Tkhai wrote:
>>
>> I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is
>> another solution to do that, I'm not again that.
> 
> The patch below takes care of this. I've done some preliminary testing,
> and I'll send it upstream after doing additional self-review/testing.
> Please also take a look, if you can, to see if I missed something.
> 
> Thanks for the input,
> 
> --Sowmini
> ---patch follows
> 
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 08ea9cd..87c2643 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net)
>   return err;
>  }
>  
> -static void __net_exit rds_tcp_exit_net(struct net *net)
> -{
> - struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> -
> - if (rtn->rds_tcp_sysctl)
> - unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
> -
> - if (net != _net && rtn->ctl_table)
> - kfree(rtn->ctl_table);
> -
> - /* If rds_tcp_exit_net() is called as a result of netns deletion,
> -  * the rds_tcp_kill_sock() device notifier would already have cleaned
> -  * up the listen socket, thus there is no work to do in this function.
> -  *
> -  * If rds_tcp_exit_net() is called as a result of module unload,
> -  * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
> -  * we do need to clean up the listen socket here.
> -  */
> - if (rtn->rds_tcp_listen_sock) {
> - struct socket *lsock = rtn->rds_tcp_listen_sock;
> -
> - rtn->rds_tcp_listen_sock = NULL;
> - rds_tcp_listen_stop(lsock, >rds_tcp_accept_w);
> - }
> -}
> -
> -static struct pernet_operations rds_tcp_net_ops = {
> - .init = rds_tcp_init_net,
> - .exit = rds_tcp_exit_net,
> - .id = _tcp_netid,
> - .size = sizeof(struct rds_tcp_net),
> - .async = true,
> -};
> -
>  static void rds_tcp_kill_sock(struct net *net)
>  {
>   struct rds_tcp_connection *tc, *_tc;
> @@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net)
>   rds_conn_destroy(tc->t_cpath->cp_conn);
>  }
>  
> -void *rds_tcp_listen_sock_def_readable(struct net *net)
> +static void __net_exit rds_tcp_exit_net(struct net *net)
>  {
>   struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> - struct socket *lsock = rtn->rds_tcp_listen_sock;
>  
> - if (!lsock)
> - return NULL;
> + rds_tcp_kill_sock(net);

rds_tcp_listen_sock destruction looks nice and safe, since all
the places the sockets is dereferenced use sk_callback_lock.
So they don't miss rds_tcp_listen_sock = NULL, as rds_tcp_listen_stop()
takes the lock too.

rds_tcp_conn_list is populated from:

1)rds_tcp_accept_one(), which can't happen after we flushed the queue
in rds_tcp_listen_stop();

2)rds_sendmsg(), which is triggered by userspace, and that's impossible,
when net is dead;

3)rds_ib_cm_handle_connect(), which call rds_conn_create() with init_net
argument only. This may race with module unloading only, but this problem
is already solved in RDS by rds_destroy_pending() check, which care about
that:

static bool rds_tcp_is_unloading(struct rds_connection *conn)
{
return atomic_read(_tcp_unloading) != 0;
}

static void rds_tcp_exit(void)
{
rds_tcp_set_unloading();
synchronize_rcu();
...
}

static struct rds_connection * __rds_conn_create(...)
{
...
rcu_read_lock();
if (rds_destroy_pending(conn))
ret = -ENETDOWN;
else
ret = trans->conn_alloc(conn, GFP_ATOMIC);
...
}

So, everything looks good for me.

Kirill

>  
> - return lsock->sk->sk_user_data;
> + if (rtn->rds_tcp_sysctl)
> + unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
> +
> + if (net != _net && rtn->ctl_table)
> + kfree(rtn->ctl_table);
>  }
>  
> -static int rds_tcp_dev_event(struct notifier_block *this,
> -  unsigned long event, void *ptr)
> +static struct pernet_operations rds_tcp_net_ops = {
> + .init = rds_tcp_init_net,
> + .exit = rds_tcp_exit_net,
> + .id = _tcp_netid,
> + .size = sizeof(struct rds_tcp_net),
> + .async = true,
> +};
> +
> +void *rds_tcp_listen_sock_def_readable(struct net *net)
>  {
> - struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> + struct socket *lsock = rtn->rds_tcp_listen_sock;
>  
> - /* rds-tcp registers as a pernet subys, so the ->exit will only
> -  * get invoked after network acitivity has quiesced. We need to
> -  * clean up all sockets  to quiesce network activity, and use
> -  * the unregistration of the per-net loopback device as a trigger
> -  * to 

Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-18 Thread Sowmini Varadhan
On (03/18/18 00:55), Kirill Tkhai wrote:
> 
> I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is
> another solution to do that, I'm not again that.

The patch below takes care of this. I've done some preliminary testing,
and I'll send it upstream after doing additional self-review/testing.
Please also take a look, if you can, to see if I missed something.

Thanks for the input,

--Sowmini
---patch follows

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 08ea9cd..87c2643 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net)
return err;
 }
 
-static void __net_exit rds_tcp_exit_net(struct net *net)
-{
-   struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
-
-   if (rtn->rds_tcp_sysctl)
-   unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
-
-   if (net != _net && rtn->ctl_table)
-   kfree(rtn->ctl_table);
-
-   /* If rds_tcp_exit_net() is called as a result of netns deletion,
-* the rds_tcp_kill_sock() device notifier would already have cleaned
-* up the listen socket, thus there is no work to do in this function.
-*
-* If rds_tcp_exit_net() is called as a result of module unload,
-* i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
-* we do need to clean up the listen socket here.
-*/
-   if (rtn->rds_tcp_listen_sock) {
-   struct socket *lsock = rtn->rds_tcp_listen_sock;
-
-   rtn->rds_tcp_listen_sock = NULL;
-   rds_tcp_listen_stop(lsock, >rds_tcp_accept_w);
-   }
-}
-
-static struct pernet_operations rds_tcp_net_ops = {
-   .init = rds_tcp_init_net,
-   .exit = rds_tcp_exit_net,
-   .id = _tcp_netid,
-   .size = sizeof(struct rds_tcp_net),
-   .async = true,
-};
-
 static void rds_tcp_kill_sock(struct net *net)
 {
struct rds_tcp_connection *tc, *_tc;
@@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net)
rds_conn_destroy(tc->t_cpath->cp_conn);
 }
 
-void *rds_tcp_listen_sock_def_readable(struct net *net)
+static void __net_exit rds_tcp_exit_net(struct net *net)
 {
struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
-   struct socket *lsock = rtn->rds_tcp_listen_sock;
 
-   if (!lsock)
-   return NULL;
+   rds_tcp_kill_sock(net);
 
-   return lsock->sk->sk_user_data;
+   if (rtn->rds_tcp_sysctl)
+   unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
+
+   if (net != _net && rtn->ctl_table)
+   kfree(rtn->ctl_table);
 }
 
-static int rds_tcp_dev_event(struct notifier_block *this,
-unsigned long event, void *ptr)
+static struct pernet_operations rds_tcp_net_ops = {
+   .init = rds_tcp_init_net,
+   .exit = rds_tcp_exit_net,
+   .id = _tcp_netid,
+   .size = sizeof(struct rds_tcp_net),
+   .async = true,
+};
+
+void *rds_tcp_listen_sock_def_readable(struct net *net)
 {
-   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+   struct socket *lsock = rtn->rds_tcp_listen_sock;
 
-   /* rds-tcp registers as a pernet subys, so the ->exit will only
-* get invoked after network acitivity has quiesced. We need to
-* clean up all sockets  to quiesce network activity, and use
-* the unregistration of the per-net loopback device as a trigger
-* to start that cleanup.
-*/
-   if (event == NETDEV_UNREGISTER_FINAL &&
-   dev->ifindex == LOOPBACK_IFINDEX)
-   rds_tcp_kill_sock(dev_net(dev));
+   if (!lsock)
+   return NULL;
 
-   return NOTIFY_DONE;
+   return lsock->sk->sk_user_data;
 }
 
-static struct notifier_block rds_tcp_dev_notifier = {
-   .notifier_call= rds_tcp_dev_event,
-   .priority = -10, /* must be called after other network notifiers */
-};
-
 /* when sysctl is used to modify some kernel socket parameters,this
  * function  resets the RDS connections in that netns  so that we can
  * restart with new parameters.  The assumption is that such reset
@@ -625,9 +589,7 @@ static void rds_tcp_exit(void)
rds_tcp_set_unloading();
synchronize_rcu();
rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
-   unregister_pernet_subsys(_tcp_net_ops);
-   if (unregister_netdevice_notifier(_tcp_dev_notifier))
-   pr_warn("could not unregister rds_tcp_dev_notifier\n");
+   unregister_pernet_device(_tcp_net_ops);
rds_tcp_destroy_conns();
rds_trans_unregister(_tcp_transport);
rds_tcp_recv_exit();
@@ -651,24 +613,17 @@ static int rds_tcp_init(void)
if (ret)
goto out_slab;
 
-   ret = register_pernet_subsys(_tcp_net_ops);
+   ret = 

Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-17 Thread Kirill Tkhai
On 18.03.2018 00:26, Sowmini Varadhan wrote:
> On (03/17/18 10:15), Sowmini Varadhan wrote:
>> To solve the scaling problem why not just have a well-defined 
>> callback to modules when devices are quiesced, instead of 
>> overloading the pernet_device registration in this obscure way?
> 
> I thought about this a bit, and maybe I missed your original point-
> today we are able to do all the needed cleanup for rds-tcp when
> we unload the module, even though network activity has not quiesced,
> and there is no reason we cannot use the same code for netns cleanup
> as well. I think this is what you were trying to ask, when you 
> said "why do you need to know that loopback is down?"

I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is
another solution to do that, I'm not again that.

> I'm sorry I  missed that, I will re-examine the code and get back to
> you- it should be possible to just do one registration and 
> cleanup rds-state and avoid the hack of registering twice

Sounds great, I'll wait for your response.
 
> (saw your most recent long mail- sorry- both v1 and v2 are hacks)
> 
> I'm on the road at the moment, so I'll get back to you on this.

Thanks,
Kirill


Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-17 Thread Sowmini Varadhan
On (03/17/18 10:15), Sowmini Varadhan wrote:
> To solve the scaling problem why not just have a well-defined 
> callback to modules when devices are quiesced, instead of 
> overloading the pernet_device registration in this obscure way?

I thought about this a bit, and maybe I missed your original point-
today we are able to do all the needed cleanup for rds-tcp when
we unload the module, even though network activity has not quiesced,
and there is no reason we cannot use the same code for netns cleanup
as well. I think this is what you were trying to ask, when you 
said "why do you need to know that loopback is down?"
I'm sorry I  missed that, I will re-examine the code and get back to
you- it should be possible to just do one registration and 
cleanup rds-state and avoid the hack of registering twice

(saw your most recent long mail- sorry- both v1 and v2 are hacks)

I'm on the road at the moment, so I'll get back to you on this.

Thanks
--Sowmini