Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
On Wed, Apr 12, 2017 at 06:11:25PM +, Bart Van Assche wrote: > On Wed, 2017-04-12 at 12:55 +0200, Benjamin Block wrote: > > On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote: > > > The six patches in this patch series fix the queue lockup I reported > > > recently on the linux-block mailing list. Please consider these patches > > > for inclusion in the upstream kernel. > > > > just out of curiosity. Is this maybe related to similar stuff happening > > when CPUs are hot plugged - at least in that the stack gets stuck? Like > > in this thread here: > > https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html > > > > Would be interesting, because we recently saw similar stuff happening. > > Hello Benjamin, > > My proposal is to repeat that test with Jens' for-next branch. If the issue > still occurs with that tree then please check the contents of > /sys/kernel/debug/block/*/mq/*/{dispatch,*/rq_list}. That will allow to > determine whether or not any block layer requests are still pending. If > running the command below resolves the deadlock then it means that a > trigger to run a block layer queue is still missing somewhere: > > for a in /sys/kernel/debug/block/*/mq/state; do echo run >$a; done > > See also git://git.kernel.dk/linux-block.git. > Thx for the hint! I'll forward that and see if the affected folks are willing to reproduce. Beste Grüße / Best regards, - Benjamin Block -- Linux on z Systems Development / IBM Systems & Technology Group IBM Deutschland Research & Development GmbH Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
On Wed, 2017-04-12 at 12:55 +0200, Benjamin Block wrote: > On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote: > > The six patches in this patch series fix the queue lockup I reported > > recently on the linux-block mailing list. Please consider these patches > > for inclusion in the upstream kernel. > > just out of curiosity. Is this maybe related to similar stuff happening > when CPUs are hot plugged - at least in that the stack gets stuck? Like > in this thread here: > https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html > > Would be interesting, because we recently saw similar stuff happening. Hello Benjamin, My proposal is to repeat that test with Jens' for-next branch. If the issue still occurs with that tree then please check the contents of /sys/kernel/debug/block/*/mq/*/{dispatch,*/rq_list}. That will allow to determine whether or not any block layer requests are still pending. If running the command below resolves the deadlock then it means that a trigger to run a block layer queue is still missing somewhere: for a in /sys/kernel/debug/block/*/mq/state; do echo run >$a; done See also git://git.kernel.dk/linux-block.git. Bart.
Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote: > Hello Jens, > > The six patches in this patch series fix the queue lockup I reported > recently on the linux-block mailing list. Please consider these patches > for inclusion in the upstream kernel. > Hey Bart, just out of curiosity. Is this maybe related to similar stuff happening when CPUs are hot plugged - at least in that the stack gets stuck? Like in this thread here: https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html Would be interesting, because we recently saw similar stuff happening. Beste Grüße / Best regards, - Benjamin Block -- Linux on z Systems Development / IBM Systems & Technology Group IBM Deutschland Research & Development GmbH Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
On 04/07/2017 12:39 PM, Bart Van Assche wrote: > On Fri, 2017-04-07 at 11:33 -0700, Bart Van Assche wrote: >> On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote: >>> On 04/07/2017 12:16 PM, Bart Van Assche wrote: Hello Jens, The six patches in this patch series fix the queue lockup I reported recently on the linux-block mailing list. Please consider these patches for inclusion in the upstream kernel. >>> >>> Some of this we need in 4.11, but not all of it. I can't be applying patches >>> that "improve scalability" at this point. >>> >>> 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then >>> we can put 1-3 on top in 4.12, with the others pulled in first. >> >> Hello Jens, >> >> Please note that patch 2/6 is a bug fix. The current implementation of >> blk_mq_sched_restart_queues() only considers hardware queues associated with >> the same request queue as the hardware queue that has been passed as an >> argument. If a tag set is shared across request queues - as is the case for >> SCSI - then all request queues that share a tag set with the hctx argument >> must be considered. > > (replying to my own e-mail) > > Hello Jens, > > If you want I can split that patch into two patches - one that runs all > hardware > queues with which the tag set is shared and one that switches from rerunning > all hardware queues to one hardware queue. I already put it in, but this is getting very awkward. We're at -rc5 time, patches going into mainline should be TINY. And now I'm sitting on this, that I have to justify: 15 files changed, 281 insertions(+), 164 deletions(-) and where one of the patches reads like it's a performance improvement, when in reality it's fixing a hang. So yes, the patch should have been split in two, and the series should have been ordered so that the first patches could go into 4.11, and the rest on top of that in 4.12. Did we really need a patch clarifying comments in that series? Probably not. -- Jens Axboe
Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
On Fri, 2017-04-07 at 11:33 -0700, Bart Van Assche wrote: > On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote: > > On 04/07/2017 12:16 PM, Bart Van Assche wrote: > > > Hello Jens, > > > > > > The six patches in this patch series fix the queue lockup I reported > > > recently on the linux-block mailing list. Please consider these patches > > > for inclusion in the upstream kernel. > > > > Some of this we need in 4.11, but not all of it. I can't be applying patches > > that "improve scalability" at this point. > > > > 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then > > we can put 1-3 on top in 4.12, with the others pulled in first. > > Hello Jens, > > Please note that patch 2/6 is a bug fix. The current implementation of > blk_mq_sched_restart_queues() only considers hardware queues associated with > the same request queue as the hardware queue that has been passed as an > argument. If a tag set is shared across request queues - as is the case for > SCSI - then all request queues that share a tag set with the hctx argument > must be considered. (replying to my own e-mail) Hello Jens, If you want I can split that patch into two patches - one that runs all hardware queues with which the tag set is shared and one that switches from rerunning all hardware queues to one hardware queue. Bart.
Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote: > On 04/07/2017 12:16 PM, Bart Van Assche wrote: > > Hello Jens, > > > > The six patches in this patch series fix the queue lockup I reported > > recently on the linux-block mailing list. Please consider these patches > > for inclusion in the upstream kernel. > > Some of this we need in 4.11, but not all of it. I can't be applying patches > that "improve scalability" at this point. > > 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then > we can put 1-3 on top in 4.12, with the others pulled in first. Hello Jens, Please note that patch 2/6 is a bug fix. The current implementation of blk_mq_sched_restart_queues() only considers hardware queues associated with the same request queue as the hardware queue that has been passed as an argument. If a tag set is shared across request queues - as is the case for SCSI - then all request queues that share a tag set with the hctx argument must be considered. Bart.
Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
On 04/07/2017 12:16 PM, Bart Van Assche wrote: > Hello Jens, > > The six patches in this patch series fix the queue lockup I reported > recently on the linux-block mailing list. Please consider these patches > for inclusion in the upstream kernel. Some of this we need in 4.11, but not all of it. I can't be applying patches that "improve scalability" at this point. 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then we can put 1-3 on top in 4.12, with the others pulled in first. -- Jens Axboe
[PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically
Hello Jens, The six patches in this patch series fix the queue lockup I reported recently on the linux-block mailing list. Please consider these patches for inclusion in the upstream kernel. Thanks, Bart. Changes between v3 and v4: - Addressed the review comments on version three of this series about the patch that makes it safe to use RCU to iterate over .tag_list and also about the runtime performance and use of short variable names in patch 2/5. - Clarified the description of the patch that fixes the scsi-mq stall. - Added a patch to fix a dm-mq queue stall. Changes between v2 and v3: - Removed the blk_mq_ops.restart_hctx function pointer again. - Modified blk_mq_sched_restart_queues() such that only a single hardware queue is restarted instead of multiple if hardware queues are shared. - Introduced a new function in the block layer, namely blk_mq_delay_run_hw_queue(). Changes between v1 and v2: - Reworked scsi_restart_queues() such that it no longer takes the SCSI host lock. - Added two patches - one for exporting blk_mq_sched_restart_hctx() and another one to make iterating with RCU over blk_mq_tag_set.tag_list safe. Bart Van Assche (6): blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list blk-mq: Restart a single queue if tag sets are shared blk-mq: Clarify comments in blk_mq_dispatch_rq_list() blk-mq: Introduce blk_mq_delay_run_hw_queue() scsi: Avoid that SCSI queues get stuck dm rq: Avoid that request processing stalls sporadically block/blk-mq-sched.c| 63 +++--- block/blk-mq-sched.h| 16 +-- block/blk-mq.c | 73 +++-- drivers/md/dm-rq.c | 1 + drivers/scsi/scsi_lib.c | 6 ++-- include/linux/blk-mq.h | 2 ++ include/linux/blkdev.h | 1 - 7 files changed, 118 insertions(+), 44 deletions(-) -- 2.12.0