Re: [Devel] [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
> Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September? I am willing to help if there's anything I can do to help get it accepted. Hi, Stuart, I asked James Bottomley about the patch status offlist and it seems that the problem is - patch lacks testing and review. I would highly appreciate review from your side and anyone who wants to participate! And if you can confirm that the patch solves the problem on your environment with no side effects please add "Tested-by:" tag also. Thanks, Pavel On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote: We have a problem on several our nodes with scsi EH. Imagine such an order of execution of two threads: CPU1 scsi_eh_scmd_add CPU2 scsi_host_queue_ready /* shost->host_busy == 1 initialy */ if (shost->shost_state == SHOST_RECOVERY) /* does not get here */ return 0; lock(shost->host_lock); shost->shost_state = SHOST_RECOVERY; busy = shost->host_busy++; /* host->can_queue == 1 initialy, busy == 1 * - go to starved label */ lock(shost->host_lock) /* wait */ shost->host_failed++; /* shost->host_busy == 2, shost->host_failed == 1 */ call scsi_eh_wakeup(shost) { if (host_busy == host_failed) { /* does not get here */ wake_up_process(shost->ehandler) } } unlock(shost->host_lock) /* acquire lock */ shost->host_busy--; Finaly we do not wakeup scsi_error_handler and all other commands coming will hang as we are in never ending recovery state as there is no one left to wakeup handler. So scsi disc in these host becomes unresponsive and all bio on node hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.) Main idea of the fix is to try to do wake up every time we decrement host_busy or increment host_failed(the latter is already OK). Now the very *last* one of busy threads getting host_lock after decrementing host_busy will see all write operations on host's shost_state, host_busy and host_failed completed thanks to implied memory barriers on spin_lock/unlock, so at the time of busy==failed we will trigger wakeup in at least one thread. (Thats why putting recovery and failed checks under lock) Signed-off-by: Pavel Tikhomirov--- drivers/scsi/scsi_lib.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f6097b89d5d3..6c99221d60aa 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev) if (starget->can_queue > 0) atomic_dec(>target_busy); + spin_lock_irqsave(shost->host_lock, flags); if (unlikely(scsi_host_in_recovery(shost) && -(shost->host_failed || shost->host_eh_scheduled))) { - spin_lock_irqsave(shost->host_lock, flags); +(shost->host_failed || shost->host_eh_scheduled))) scsi_eh_wakeup(shost); - spin_unlock_irqrestore(shost->host_lock, flags); - } + spin_unlock_irqrestore(shost->host_lock, flags); atomic_dec(>device_busy); } @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q, spin_unlock_irq(shost->host_lock); out_dec: atomic_dec(>host_busy); + + spin_lock_irq(shost->host_lock); + if (unlikely(scsi_host_in_recovery(shost) && +(shost->host_failed || shost->host_eh_scheduled))) + scsi_eh_wakeup(shost); + spin_unlock_irq(shost->host_lock); + return 0; } @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, out_dec_host_busy: atomic_dec(>host_busy); + + spin_lock_irq(shost->host_lock); + if (unlikely(scsi_host_in_recovery(shost) && +(shost->host_failed || shost->host_eh_scheduled))) + scsi_eh_wakeup(shost); + spin_unlock_irq(shost->host_lock); + out_dec_target_busy: if (scsi_target(sdev)->can_queue > 0) atomic_dec(_target(sdev)->target_busy); -- Best regards, Tikhomirov Pavel Software Developer, Virtuozzo. ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] nfs: protect callback execution against per-net callback thread shutdown
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-693.1.1.vz7.37.25 --> commit 2149800a70af636b2b22289fc5aa977420b392c2 Author: Stanislav KinsburskiyDate: Thu Nov 9 14:12:42 2017 +0300 nfs: protect callback execution against per-net callback thread shutdown Patchset description: nfs: fix race between callback shutdown and execution The idea is to use mutex for protecting callback execution agains per-net callback shutdown and destroying all the net-related backchannel requests before transports destruction. Stanislav Kinsburskiy (2): sunrpc: bc_svc_flush_queue_net() helper introduced nfs: protect callback execution against per-net callback thread shutdown === This patch description: Here is the race: CPU #0 CPU#1 cleanup_mnt nfs41_callback_svc (get xprt from the list) nfs_callback_down ... ... ... svc_close_net ... ... ... svc_xprt_free ... svc_bc_sock_freebc_svc_process kfree(xprt) svc_process_common rqstp->rq_xprt->xpt_ops (use after free) The problem is that per-net SUNRPC transports shutdown is done regardless current callback execution. This is a race leading to transport use-after-free in callback handler. This patch fixes it in stright-forward way. I.e. it protects callback execution with the same mutex used for per-net data creation and destruction. Hopefully, it won't slow down NFS client significantly. https://jira.sw.ru/browse/PSBM-75751 v5: destroy all per-net backchannel requests before transports on in nfs_callback_down_net v4: use another mutex to protect callback execution agains per-net transports shutdown. This guarantees, that transports won't be destroyed by shutdown callback while execution is in progress and vice versa. v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with mutex taken), while thread wait for the mutex to take. The idea is to simply check if thread has to exit, if mutex lock has failed. This is a busy loop, but it shouldn't happend often and for long. Signed-off-by: Stanislav Kinsburskiy Reviewed-by: Kirill Tkhai --- fs/nfs/callback.c | 17 + 1 file changed, 17 insertions(+) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 0beb275..e18d774 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -99,6 +99,8 @@ nfs4_callback_up(struct svc_serv *serv) } #if defined(CONFIG_NFS_V4_1) +static DEFINE_MUTEX(nfs41_callback_mutex); + /* * The callback service for NFSv4.1 callbacks */ @@ -117,6 +119,12 @@ nfs41_callback_svc(void *vrqstp) if (try_to_freeze()) continue; + mutex_lock(_callback_mutex); + if (kthread_should_stop()) { + mutex_unlock(_callback_mutex); + return 0; + } + prepare_to_wait(>sv_cb_waitq, , TASK_INTERRUPTIBLE); spin_lock_bh(>sv_cb_lock); if (!list_empty(>sv_cb_list)) { @@ -129,8 +137,10 @@ nfs41_callback_svc(void *vrqstp) error = bc_svc_process(serv, req, rqstp); dprintk("bc_svc_process() returned w/ error code= %d\n", error); + mutex_unlock(_callback_mutex); } else { spin_unlock_bh(>sv_cb_lock); + mutex_unlock(_callback_mutex); schedule(); finish_wait(>sv_cb_waitq, ); } @@ -242,6 +252,7 @@ static void nfs_callback_down_net(u32 minorversion, struct svc_serv *serv, struc return; dprintk("NFS: destroy per-net callback data; net=%p\n", net); + bc_svc_flush_queue_net(serv, net); svc_shutdown_net(serv, net); } @@ -377,7 +388,13 @@ void nfs_callback_down(int minorversion, struct net *net) struct nfs_callback_data *cb_info = _callback_info[minorversion]; mutex_lock(_callback_mutex); +#if defined(CONFIG_NFS_V4_1) + mutex_lock(_callback_mutex); + nfs_callback_down_net(minorversion, cb_info->serv, net); + mutex_unlock(_callback_mutex); +#else nfs_callback_down_net(minorversion, cb_info->serv, net); +#endif cb_info->users--; if (cb_info->users == 0 && cb_info->task != NULL) { kthread_stop(cb_info->task);
[Devel] [PATCH RHEL7 COMMIT] sunrpc: bc_svc_flush_queue_net() helper introduced
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-693.1.1.vz7.37.25 --> commit 99bfffc43ae40204d445f2c138071176d3e7b03d Author: Stanislav KinsburskiyDate: Thu Nov 9 14:12:41 2017 +0300 sunrpc: bc_svc_flush_queue_net() helper introduced Patchset description: nfs: fix race between callback shutdown and execution The idea is to use mutex for protecting callback execution agains per-net callback shutdown and destroying all the net-related backchannel requests before transports destruction. Stanislav Kinsburskiy (2): sunrpc: bc_svc_flush_queue_net() helper introduced nfs: protect callback execution against per-net callback thread shutdown === This patch description: This helper can be used to remove backchannel requests from callback queue on per-net basis. Signed-off-by: Stanislav Kinsburskiy Reviewed-by: Kirill Tkhai --- include/linux/sunrpc/svc.h | 2 ++ net/sunrpc/svc.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 2b30868..fe70ff0 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -484,6 +484,8 @@ void svc_reserve(struct svc_rqst *rqstp, int space); struct svc_pool * svc_pool_for_cpu(struct svc_serv *serv, int cpu); char *svc_print_addr(struct svc_rqst *, char *, size_t); +void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net); + #defineRPC_MAX_ADDRBUFLEN (63U) /* diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index de8cded1..2ca4ff7 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1338,6 +1338,21 @@ svc_process(struct svc_rqst *rqstp) EXPORT_SYMBOL_GPL(svc_process); #if defined(CONFIG_SUNRPC_BACKCHANNEL) +void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net) +{ + struct rpc_rqst *req, *tmp; + + spin_lock_bh(>sv_cb_lock); + list_for_each_entry_safe(req, tmp, >sv_cb_list, rq_bc_list) { + if (req->rq_xprt->xprt_net == net) { + list_del(>rq_bc_list); + xprt_free_bc_request(req); + } + } + spin_unlock_bh(>sv_cb_lock); +} +EXPORT_SYMBOL_GPL(bc_svc_flush_queue_net); + /* * Process a backchannel RPC request that arrived over an existing * outbound connection ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH] mm: Kill allocate_vma() and free_vma()
Currently they are noops and just introduce unnecessary changes in "Initial patch". Kill them. Please, merge this to "Initial patch" on rebase. Signed-off-by: Kirill Tkhai--- fs/exec.c |4 ++-- include/linux/mm.h |4 +--- kernel/fork.c |8 mm/mmap.c | 22 +++--- 4 files changed, 18 insertions(+), 20 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index c8450118925..bbbe293ea24 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -257,7 +257,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm) NULL, UB_SOFT)) goto err_charge; - bprm->vma = vma = allocate_vma(mm, GFP_KERNEL | __GFP_ZERO); + bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); if (!vma) goto err_alloc; @@ -289,7 +289,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm) err: up_write(>mmap_sem); bprm->vma = NULL; - free_vma(mm, vma); + kmem_cache_free(vm_area_cachep, vma); err_alloc: ub_memory_uncharge(mm, PAGE_SIZE, VM_STACK_FLAGS | mm->def_flags, NULL); err_charge: diff --git a/include/linux/mm.h b/include/linux/mm.h index 897d7cfd226..57a63882fff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -93,9 +93,7 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, * mmap() functions). */ -extern struct kmem_cache *__vm_area_cachep; -#define allocate_vma(mm, gfp_flags)kmem_cache_alloc(__vm_area_cachep, gfp_flags) -#define free_vma(mm, vma) kmem_cache_free(__vm_area_cachep, vma) +extern struct kmem_cache *vm_area_cachep; #ifndef CONFIG_MMU extern struct rb_root nommu_region_tree; diff --git a/kernel/fork.c b/kernel/fork.c index 3ad0977c698..78a47e0f471 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -235,7 +235,7 @@ struct kmem_cache *files_cachep; struct kmem_cache *fs_cachep; /* SLAB cache for vm_area_struct structures */ -struct kmem_cache *__vm_area_cachep; +struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; @@ -458,7 +458,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) goto fail_nomem; charge = len; } - tmp = allocate_vma(mm, GFP_KERNEL); + tmp = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (!tmp) goto fail_nomem; *tmp = *mpnt; @@ -538,7 +538,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) fail_nomem_anon_vma_fork: mpol_put(pol); fail_nomem_policy: - free_vma(mm, tmp); + kmem_cache_free(vm_area_cachep, tmp); fail_nomem: ub_memory_uncharge(mm, mpnt->vm_end - mpnt->vm_start, mpnt->vm_flags & ~VM_LOCKED, mpnt->vm_file); @@ -1923,7 +1923,7 @@ void __init proc_caches_init(void) sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT, NULL); - __vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); + vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); mmap_init(); nsproxy_cache_init(); } diff --git a/mm/mmap.c b/mm/mmap.c index 3e9eeea3522..b66e5f23b54 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -290,7 +290,7 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma) if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); - free_vma(vma->vm_mm, vma); + kmem_cache_free(vm_area_cachep, vma); return next; } @@ -1001,7 +1001,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, anon_vma_merge(vma, next); mm->map_count--; mpol_put(vma_policy(next)); - free_vma(mm, next); + kmem_cache_free(vm_area_cachep, next); /* * In mprotect's case 6 (see comments on vma_merge), * we must remove another next too. It would clutter @@ -1742,7 +1742,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, * specific mapper. the address has already been validated, but * not unmapped, but the maps are removed from the list. */ - vma = allocate_vma(mm, GFP_KERNEL | __GFP_ZERO); + vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); if (!vma) { error = -ENOMEM; goto unacct_error; @@ -1866,7 +1866,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, if (vm_flags & VM_DENYWRITE) allow_write_access(file); free_vma: - free_vma(mm, vma); + kmem_cache_free(vm_area_cachep, vma); unacct_error:
Re: [Devel] [PATCH v5 0/2] nfs: fix race between callback shutdown and execution
On 08.11.2017 20:23, Stanislav Kinsburskiy wrote: > The idea is to use mutex for protecting callback execution agains per-net > callback shutdown and destroying all the net-related backchannel requests > before transports destruction. > > > --- > > Stanislav Kinsburskiy (2): > sunrpc: bc_svc_flush_queue_net() helper introduced > nfs: protect callback execution against per-net callback thread shutdown > > > fs/nfs/callback.c | 17 + > include/linux/sunrpc/svc.h |2 ++ > net/sunrpc/svc.c | 15 +++ > 3 files changed, 34 insertions(+) Reviewed-by: Kirill Tkhai___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH v5 2/2] nfs: protect callback execution against per-net callback thread shutdown
09.11.2017 10:33, Kirill Tkhai пишет: > > Looks good for me, except there is one more call of nfs_callback_down_net() > in nfs_callback_up(). > It's not clear for me, we shouldn't protect it via the mutex too. Should we? > Looks like no, mount is not ready yet and there is no connection to server, thus no request from server can come to the moment and we don't need any additional protection here. ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH v5 2/2] nfs: protect callback execution against per-net callback thread shutdown
On 08.11.2017 20:23, Stanislav Kinsburskiy wrote: > From: Stanislav Kinsburskiy> > Here is the race: > > CPU #0CPU#1 > > cleanup_mnt nfs41_callback_svc (get xprt from the list) > nfs_callback_down ... > ... ... > svc_close_net ... > ... ... > svc_xprt_free ... > svc_bc_sock_free bc_svc_process > kfree(xprt) svc_process_common > rqstp->rq_xprt->xpt_ops (use after free) > > The problem is that per-net SUNRPC transports shutdown is done regardless > current callback execution. This is a race leading to transport use-after-free > in callback handler. > This patch fixes it in stright-forward way. I.e. it protects callback > execution with the same mutex used for per-net data creation and destruction. > Hopefully, it won't slow down NFS client significantly. > > https://jira.sw.ru/browse/PSBM-75751 > > v5: destroy all per-net backchannel requests before transports on in > nfs_callback_down_net > > v4: use another mutex to protect callback execution agains per-net transports > shutdown. > This guarantees, that transports won't be destroyed by shutdown callback while > execution is in progress and vice versa. > > v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with > mutex taken), while thread wait for the mutex to take. > The idea is to simply check if thread has to exit, if mutex lock has failed. > This is a busy loop, but it shouldn't happend often and for long. > > Signed-off-by: Stanislav Kinsburskiy > --- > fs/nfs/callback.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c > index 0beb275..e18d774 100644 > --- a/fs/nfs/callback.c > +++ b/fs/nfs/callback.c > @@ -99,6 +99,8 @@ nfs4_callback_up(struct svc_serv *serv) > } > > #if defined(CONFIG_NFS_V4_1) > +static DEFINE_MUTEX(nfs41_callback_mutex); > + > /* > * The callback service for NFSv4.1 callbacks > */ > @@ -117,6 +119,12 @@ nfs41_callback_svc(void *vrqstp) > if (try_to_freeze()) > continue; > > + mutex_lock(_callback_mutex); > + if (kthread_should_stop()) { > + mutex_unlock(_callback_mutex); > + return 0; > + } > + > prepare_to_wait(>sv_cb_waitq, , TASK_INTERRUPTIBLE); > spin_lock_bh(>sv_cb_lock); > if (!list_empty(>sv_cb_list)) { > @@ -129,8 +137,10 @@ nfs41_callback_svc(void *vrqstp) > error = bc_svc_process(serv, req, rqstp); > dprintk("bc_svc_process() returned w/ error code= %d\n", > error); > + mutex_unlock(_callback_mutex); > } else { > spin_unlock_bh(>sv_cb_lock); > + mutex_unlock(_callback_mutex); > schedule(); > finish_wait(>sv_cb_waitq, ); > } > @@ -242,6 +252,7 @@ static void nfs_callback_down_net(u32 minorversion, > struct svc_serv *serv, struc > return; > > dprintk("NFS: destroy per-net callback data; net=%p\n", net); > + bc_svc_flush_queue_net(serv, net); > svc_shutdown_net(serv, net); > } > > @@ -377,7 +388,13 @@ void nfs_callback_down(int minorversion, struct net *net) > struct nfs_callback_data *cb_info = _callback_info[minorversion]; > > mutex_lock(_callback_mutex); > +#if defined(CONFIG_NFS_V4_1) > + mutex_lock(_callback_mutex); > + nfs_callback_down_net(minorversion, cb_info->serv, net); > + mutex_unlock(_callback_mutex); > +#else > nfs_callback_down_net(minorversion, cb_info->serv, net); > +#endif > cb_info->users--; > if (cb_info->users == 0 && cb_info->task != NULL) { > kthread_stop(cb_info->task); > Looks good for me, except there is one more call of nfs_callback_down_net() in nfs_callback_up(). It's not clear for me, we shouldn't protect it via the mutex too. Should we? ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel