Re: [Devel] [PATCH v3] nfs: protect callback execution against per-net callback thread shutdown

2017-11-07 Thread Stanislav Kinsburskiy
Please, see my comments below.

07.11.2017 12:30, Kirill Tkhai пишет:
> On 07.11.2017 13:39, Stanislav Kinsburskiy wrote:
>> From: Stanislav Kinsburskiy 
>>
>> Here is the race:
>>
>> CPU #0   CPU#1
>>
>> cleanup_mnt  nfs41_callback_svc (get xprt from the list)
>> nfs_callback_down...
>> ...  ...
>> svc_close_net...
>> ...  ...
>> svc_xprt_free...
>> svc_bc_sock_free bc_svc_process
>> kfree(xprt)  svc_process_common
>>  rqstp->rq_xprt->xpt_ops (use after free)
>>
>> The problem is that per-net SUNRPC transports shutdown is done regardless
>> current callback execution. This is a race leading to transport 
>> use-after-free
>> in callback handler.
>> This patch fixes it in stright-forward way. I.e. it protects callback
>> execution with the same mutex used for per-net data creation and destruction.
>> Hopefully, it won't slow down NFS client significantly.
>>
>> https://jira.sw.ru/browse/PSBM-75751
>>
>> v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
>> mutex taken), while thread wait for the mutex to take.
>> The idea is to simply check if thread has to exit, if mutex lock has failed.
>> This is a busy loop, but it shouldn't happend often and for long.
>>
>> Signed-off-by: Stanislav Kinsburskiy 
>> ---
>>  fs/nfs/callback.c |6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
>> index 0beb275..bbb07e4 100644
>> --- a/fs/nfs/callback.c
>> +++ b/fs/nfs/callback.c
>> @@ -117,7 +117,11 @@ nfs41_callback_svc(void *vrqstp)
>>  if (try_to_freeze())
>>  continue;
>>  
>> +if (!mutex_trylock(&nfs_callback_mutex))
>> +   continue;
> 
> This looks like a busy loop (that especially bad, because mutex-owner may 
> sleep
> at the moment we are looping).
> 

Well, yes. It a busy loop. And that's not good. But I have to mention, that 
this busy loop can happen only on umount request.

> It seems the solution may be to change nfs_callback_down() function.
> Can't we flush pending request in nfs_callback_down()?

Well, this already happens. The problem is in race between transport usage 
(unconditional) and its destruction (also unconditional).
IOW, there should be either reference counting or some critical section.
Looks like the former will need a way more code and thus more error-prone, the 
latter is uglier, but code should be simpler at first sight.


> Or just delete it from sv_cb_list without handling (does nfs proto allow 
> that)?
> 

Well, this looks like a promising idea. But yet again, there should be some 
protection against transport usage in shutdown helper.
And it's not yet clear, how to implement it without significant code change...

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH v3] nfs: protect callback execution against per-net callback thread shutdown

2017-11-07 Thread Kirill Tkhai
On 07.11.2017 13:39, Stanislav Kinsburskiy wrote:
> From: Stanislav Kinsburskiy 
> 
> Here is the race:
> 
> CPU #0CPU#1
> 
> cleanup_mnt   nfs41_callback_svc (get xprt from the list)
> nfs_callback_down ...
> ...   ...
> svc_close_net ...
> ...   ...
> svc_xprt_free ...
> svc_bc_sock_free  bc_svc_process
> kfree(xprt)   svc_process_common
>   rqstp->rq_xprt->xpt_ops (use after free)
> 
> The problem is that per-net SUNRPC transports shutdown is done regardless
> current callback execution. This is a race leading to transport use-after-free
> in callback handler.
> This patch fixes it in stright-forward way. I.e. it protects callback
> execution with the same mutex used for per-net data creation and destruction.
> Hopefully, it won't slow down NFS client significantly.
> 
> https://jira.sw.ru/browse/PSBM-75751
> 
> v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
> mutex taken), while thread wait for the mutex to take.
> The idea is to simply check if thread has to exit, if mutex lock has failed.
> This is a busy loop, but it shouldn't happend often and for long.
> 
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  fs/nfs/callback.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index 0beb275..bbb07e4 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -117,7 +117,11 @@ nfs41_callback_svc(void *vrqstp)
>   if (try_to_freeze())
>   continue;
>  
> + if (!mutex_trylock(&nfs_callback_mutex))
> +continue;

This looks like a busy loop (that especially bad, because mutex-owner may sleep
at the moment we are looping).

It seems the solution may be to change nfs_callback_down() function.
Can't we flush pending request in nfs_callback_down()? Or just delete
it from sv_cb_list without handling (does nfs proto allow that)?

Also, one more additional argument for flush is a suspicion there may be more 
than
one pending request in the list. Can it be?

> +
>   prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
> +
>   spin_lock_bh(&serv->sv_cb_lock);
>   if (!list_empty(&serv->sv_cb_list)) {
>   req = list_first_entry(&serv->sv_cb_list,
> @@ -129,8 +133,10 @@ nfs41_callback_svc(void *vrqstp)
>   error = bc_svc_process(serv, req, rqstp);
>   dprintk("bc_svc_process() returned w/ error code= %d\n",
>   error);
> + mutex_unlock(&nfs_callback_mutex);
>   } else {
>   spin_unlock_bh(&serv->sv_cb_lock);
> + mutex_unlock(&nfs_callback_mutex);
>   schedule();
>   finish_wait(&serv->sv_cb_waitq, &wq);
>   }
> 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH v3] nfs: protect callback execution against per-net callback thread shutdown

2017-11-07 Thread Stanislav Kinsburskiy
From: Stanislav Kinsburskiy 

Here is the race:

CPU #0  CPU#1

cleanup_mnt nfs41_callback_svc (get xprt from the list)
nfs_callback_down   ...
... ...
svc_close_net   ...
... ...
svc_xprt_free   ...
svc_bc_sock_freebc_svc_process
kfree(xprt) svc_process_common
rqstp->rq_xprt->xpt_ops (use after free)

The problem is that per-net SUNRPC transports shutdown is done regardless
current callback execution. This is a race leading to transport use-after-free
in callback handler.
This patch fixes it in stright-forward way. I.e. it protects callback
execution with the same mutex used for per-net data creation and destruction.
Hopefully, it won't slow down NFS client significantly.

https://jira.sw.ru/browse/PSBM-75751

v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
mutex taken), while thread wait for the mutex to take.
The idea is to simply check if thread has to exit, if mutex lock has failed.
This is a busy loop, but it shouldn't happend often and for long.

Signed-off-by: Stanislav Kinsburskiy 
---
 fs/nfs/callback.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 0beb275..bbb07e4 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -117,7 +117,11 @@ nfs41_callback_svc(void *vrqstp)
if (try_to_freeze())
continue;
 
+   if (!mutex_trylock(&nfs_callback_mutex))
+  continue;
+
prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
+
spin_lock_bh(&serv->sv_cb_lock);
if (!list_empty(&serv->sv_cb_list)) {
req = list_first_entry(&serv->sv_cb_list,
@@ -129,8 +133,10 @@ nfs41_callback_svc(void *vrqstp)
error = bc_svc_process(serv, req, rqstp);
dprintk("bc_svc_process() returned w/ error code= %d\n",
error);
+   mutex_unlock(&nfs_callback_mutex);
} else {
spin_unlock_bh(&serv->sv_cb_lock);
+   mutex_unlock(&nfs_callback_mutex);
schedule();
finish_wait(&serv->sv_cb_waitq, &wq);
}

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel