Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
On Fri, 2017-08-11 at 11:05 +1000, Michael Ellerman wrote: > kworker/u193:0 D12736 6 2 0x0800 > Workqueue: events_unbound async_run_entry_fn > Call Trace: > [c003f7597410] [c0150d00] console_unlock+0x330/0x770 (unreliable) > [c003f75975e0] [c001b3b8] __switch_to+0x2a8/0x460 > [c003f7597640] [c09abc60] __schedule+0x320/0xaa0 > [c003f7597710] [c09ac420] schedule+0x40/0xb0 > [c003f7597740] [c09b09d4] schedule_timeout+0x254/0x440 > [c003f7597820] [c09aca80] io_schedule_timeout+0x30/0x60 > [c003f7597850] [c09ad75c] > wait_for_common_io.constprop.2+0xbc/0x1e0 > [c003f75978d0] [c0509e6c] blk_execute_rq+0x4c/0x70 > [c003f7597920] [c0654abc] scsi_execute+0xfc/0x260 > [c003f7597990] [c0654f98] scsi_mode_sense+0x218/0x410 > [c003f7597aa0] [c068ee68] sd_revalidate_disk+0x908/0x1cd0 > [c003f7597be0] [c0690434] sd_probe_async+0xb4/0x220 > [c003f7597c60] [c0110a20] async_run_entry_fn+0x70/0x170 > [c003f7597ca0] [c0103904] process_one_work+0x2b4/0x560 > [c003f7597d30] [c0103c38] worker_thread+0x88/0x5a0 > [c003f7597dc0] [c010bfcc] kthread+0x15c/0x1a0 > [c003f7597e30] [c000ba1c] ret_from_kernel_thread+0x5c/0xc0 Hello Michael, It is suspicious that entries like the above appear in the SysRq-w output. Every time I saw this in the past it was caused by a block driver not having called blk_end_request() or a SCSI LLD not having called .scsi_done(). Additionally, it is unlikely that the patch at the start of this thread introduced this issue. Can you have another look at the ipr driver? If a shell is available at the time this lockup occurs, it will probably be helpful to have a look at the debugfs entries under /sys/kernel/debug/block/. Thanks, Bart.
Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
Bart Van Assche writes: > On Thu, 2017-08-10 at 20:32 +1000, Michael Ellerman wrote: >> "Martin K. Petersen" writes: >> > > One of the two scsi-mq functions that requeue a request unprepares a >> > > request before requeueing (scsi_io_completion()) but the other >> > > function not (__scsi_queue_insert()). Make sure that a request is >> > > unprepared before requeuing it. >> > >> > Applied to 4.13/scsi-fixes. Thanks much! >> >> This seems to be preventing my Power8 box, which uses IPR, from booting. ... > > Hello Michael, > > Thanks for having reported this early. No worries. > Is there any chance that you can > reproduce this state, press SysRq-w on the console and collect the task > overview that is reported on the console (see also Documentation/admin-guide/ > sysrq.rst)? Here it is below. Doesn't seem to change over time. I can do the scsi_logging_level thing as well as soon as I've got some coffee :) cheers sysrq: SysRq : Show Blocked State taskPC stack pid father swapper/0 D10080 1 0 0x0800 Call Trace: [c003f7583890] [c003f75838e0] 0xc003f75838e0 (unreliable) [c003f7583a60] [c001b3b8] __switch_to+0x2a8/0x460 [c003f7583ac0] [c09abc60] __schedule+0x320/0xaa0 [c003f7583b90] [c09ac420] schedule+0x40/0xb0 [c003f7583bc0] [c0110fc4] async_synchronize_cookie_domain+0xd4/0x150 [c003f7583c30] [c0619f94] wait_for_device_probe+0x44/0xe0 [c003f7583c90] [c0c64ce4] prepare_namespace+0x58/0x248 [c003f7583d00] [c0c64478] kernel_init_freeable+0x310/0x348 [c003f7583dc0] [c000d6e4] kernel_init+0x24/0x150 [c003f7583e30] [c000ba1c] ret_from_kernel_thread+0x5c/0xc0 kworker/u193:0 D12736 6 2 0x0800 Workqueue: events_unbound async_run_entry_fn Call Trace: [c003f7597410] [c0150d00] console_unlock+0x330/0x770 (unreliable) [c003f75975e0] [c001b3b8] __switch_to+0x2a8/0x460 [c003f7597640] [c09abc60] __schedule+0x320/0xaa0 [c003f7597710] [c09ac420] schedule+0x40/0xb0 [c003f7597740] [c09b09d4] schedule_timeout+0x254/0x440 [c003f7597820] [c09aca80] io_schedule_timeout+0x30/0x60 [c003f7597850] [c09ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0 [c003f75978d0] [c0509e6c] blk_execute_rq+0x4c/0x70 [c003f7597920] [c0654abc] scsi_execute+0xfc/0x260 [c003f7597990] [c0654f98] scsi_mode_sense+0x218/0x410 [c003f7597aa0] [c068ee68] sd_revalidate_disk+0x908/0x1cd0 [c003f7597be0] [c0690434] sd_probe_async+0xb4/0x220 [c003f7597c60] [c0110a20] async_run_entry_fn+0x70/0x170 [c003f7597ca0] [c0103904] process_one_work+0x2b4/0x560 [c003f7597d30] [c0103c38] worker_thread+0x88/0x5a0 [c003f7597dc0] [c010bfcc] kthread+0x15c/0x1a0 [c003f7597e30] [c000ba1c] ret_from_kernel_thread+0x5c/0xc0 kworker/u193:1 D12480 412 2 0x0800 Workqueue: events_unbound async_run_entry_fn Call Trace: [c003f5907410] [c0150d00] console_unlock+0x330/0x770 (unreliable) [c003f59075e0] [c001b3b8] __switch_to+0x2a8/0x460 [c003f5907640] [c09abc60] __schedule+0x320/0xaa0 [c003f5907710] [c09ac420] schedule+0x40/0xb0 [c003f5907740] [c09b09d4] schedule_timeout+0x254/0x440 [c003f5907820] [c09aca80] io_schedule_timeout+0x30/0x60 [c003f5907850] [c09ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0 [c003f59078d0] [c0509e6c] blk_execute_rq+0x4c/0x70 [c003f5907920] [c0654abc] scsi_execute+0xfc/0x260 [c003f5907990] [c0654f98] scsi_mode_sense+0x218/0x410 [c003f5907aa0] [c068ee68] sd_revalidate_disk+0x908/0x1cd0 [c003f5907be0] [c0690434] sd_probe_async+0xb4/0x220 [c003f5907c60] [c0110a20] async_run_entry_fn+0x70/0x170 [c003f5907ca0] [c0103904] process_one_work+0x2b4/0x560 [c003f5907d30] [c0103c38] worker_thread+0x88/0x5a0 [c003f5907dc0] [c010bfcc] kthread+0x15c/0x1a0 [c003f5907e30] [c000ba1c] ret_from_kernel_thread+0x5c/0xc0 kworker/u193:2 D12832 421 2 0x0800 Workqueue: events_unbound async_run_entry_fn Call Trace: [c003f4103410] [c003f41035f0] 0xc003f41035f0 (unreliable) [c003f41035e0] [c001b3b8] __switch_to+0x2a8/0x460 [c003f4103640] [c09abc60] __schedule+0x320/0xaa0 [c003f4103710] [c09ac420] schedule+0x40/0xb0 [c003f4103740] [c09b09d4] schedule_timeout+0x254/0x440 [c003f4103820] [c09aca80] io_schedule_timeout+0x30/0x60 [c003f4103850] [c09ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0 [c003f41038d0] [c0509e6c] blk_execute_rq+0x4c/0x70 [c003f4103920] [c0654abc] scsi_execute+0xfc/0x260 [c003f4103990] [c0654f98] scsi_mode_sense+0x218/0x410 [c003f4103aa0] [c000
Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
On 08/10/2017 10:26 AM, Bart Van Assche wrote: > On Thu, 2017-08-10 at 20:32 +1000, Michael Ellerman wrote: >> "Martin K. Petersen" writes: One of the two scsi-mq functions that requeue a request unprepares a request before requeueing (scsi_io_completion()) but the other function not (__scsi_queue_insert()). Make sure that a request is unprepared before requeuing it. >>> >>> Applied to 4.13/scsi-fixes. Thanks much! >> >> This seems to be preventing my Power8 box, which uses IPR, from booting. >> >> Bisect said so: >> >> # first bad commit: [270065e92c317845d69095ec8e3d18616b5b39d5] scsi: >> scsi-mq: Always unprepare before requeuing a request >> >> And if I revert that on top of next-20170809 my system boots again. >> >> The symptom is that it just gets "stuck" during boot and never gets to >> mounting root, full log below, the end is: >> >> . >> ready >> ready >> sd 0:2:4:0: [sde] 554287104 512-byte logical blocks: (284 GB/264 GiB) >> sd 0:2:4:0: [sde] 4096-byte physical blocks >> sd 0:2:5:0: [sdf] 272646144 512-byte logical blocks: (140 GB/130 GiB) >> sd 0:2:5:0: [sdf] 4096-byte physical blocks >> sd 0:2:4:0: [sde] Write Protect is off >> sd 0:2:4:0: [sde] Mode Sense: 0b 00 00 08 >> sd 0:2:5:0: [sdf] Write Protect is off >> sd 0:2:5:0: [sdf] Mode Sense: 0b 00 00 08 >> >> >> And it just sits there for at least hours. >> >> I compared a good and bad boot log and there appears to be essentially >> no difference. Certainly nothing that looks suspicous. > > Hello Michael, > > Thanks for having reported this early. Is there any chance that you can > reproduce this state, press SysRq-w on the console and collect the task > overview that is reported on the console (see also Documentation/admin-guide/ > sysrq.rst)? If this is not possible or if that task overview does not report > any blocked tasks, can you add scsi_mod.scsi_logging_level=-1 to the kernel > command line (either through /etc/default/grub or in /boot/grub2/grub.cfg > when using GRUB), reboot the system, capture the console output and report > that output as a reply to this e-mail? I'm building a kernel to try to reproduce this on a machine with ipr. -Brian -- Brian King Power Linux I/O IBM Linux Technology Center
Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
On Thu, 2017-08-10 at 20:32 +1000, Michael Ellerman wrote: > "Martin K. Petersen" writes: > > > One of the two scsi-mq functions that requeue a request unprepares a > > > request before requeueing (scsi_io_completion()) but the other > > > function not (__scsi_queue_insert()). Make sure that a request is > > > unprepared before requeuing it. > > > > Applied to 4.13/scsi-fixes. Thanks much! > > This seems to be preventing my Power8 box, which uses IPR, from booting. > > Bisect said so: > > # first bad commit: [270065e92c317845d69095ec8e3d18616b5b39d5] scsi: scsi-mq: > Always unprepare before requeuing a request > > And if I revert that on top of next-20170809 my system boots again. > > The symptom is that it just gets "stuck" during boot and never gets to > mounting root, full log below, the end is: > > . > ready > ready > sd 0:2:4:0: [sde] 554287104 512-byte logical blocks: (284 GB/264 GiB) > sd 0:2:4:0: [sde] 4096-byte physical blocks > sd 0:2:5:0: [sdf] 272646144 512-byte logical blocks: (140 GB/130 GiB) > sd 0:2:5:0: [sdf] 4096-byte physical blocks > sd 0:2:4:0: [sde] Write Protect is off > sd 0:2:4:0: [sde] Mode Sense: 0b 00 00 08 > sd 0:2:5:0: [sdf] Write Protect is off > sd 0:2:5:0: [sdf] Mode Sense: 0b 00 00 08 > > > And it just sits there for at least hours. > > I compared a good and bad boot log and there appears to be essentially > no difference. Certainly nothing that looks suspicous. Hello Michael, Thanks for having reported this early. Is there any chance that you can reproduce this state, press SysRq-w on the console and collect the task overview that is reported on the console (see also Documentation/admin-guide/ sysrq.rst)? If this is not possible or if that task overview does not report any blocked tasks, can you add scsi_mod.scsi_logging_level=-1 to the kernel command line (either through /etc/default/grub or in /boot/grub2/grub.cfg when using GRUB), reboot the system, capture the console output and report that output as a reply to this e-mail? Thanks, Bart.
Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
Bart, > One of the two scsi-mq functions that requeue a request unprepares a > request before requeueing (scsi_io_completion()) but the other > function not (__scsi_queue_insert()). Make sure that a request is > unprepared before requeuing it. Applied to 4.13/scsi-fixes. Thanks much! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
Thanks Bart, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
Looks fine, Reviewed-by: Christoph Hellwig
Re: [PATCH] scsi-mq: Always unprepare before requeuing a request
On 8/4/17 06:40, Bart Van Assche wrote: > One of the two scsi-mq functions that requeue a request unprepares > a request before requeueing (scsi_io_completion()) but the other > function not (__scsi_queue_insert()). Make sure that a request is > unprepared before requeuing it. > > Fixes: commit d285203cf647 ("scsi: add support for a blk-mq based I/O path.") > Signed-off-by: Bart Van Assche > Cc: Christoph Hellwig > Cc: Hannes Reinecke > Cc: Damien Le Moal > Cc: Johannes Thumshirn > Cc: > --- > drivers/scsi/scsi_lib.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 4a2f705cdb14..c7514f3b444a 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -44,6 +44,8 @@ static struct kmem_cache *scsi_sense_cache; > static struct kmem_cache *scsi_sense_isadma_cache; > static DEFINE_MUTEX(scsi_sense_cache_mutex); > > +static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd); > + > static inline struct kmem_cache * > scsi_select_sense_cache(bool unchecked_isa_dma) > { > @@ -140,6 +142,12 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd) > { > struct scsi_device *sdev = cmd->device; > > + if (cmd->request->rq_flags & RQF_DONTPREP) { > + cmd->request->rq_flags &= ~RQF_DONTPREP; > + scsi_mq_uninit_cmd(cmd); > + } else { > + WARN_ON_ONCE(true); > + } > blk_mq_requeue_request(cmd->request, true); > put_device(&sdev->sdev_gendev); > } > @@ -995,8 +1003,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned > int good_bytes) >* A new command will be prepared and issued. >*/ > if (q->mq_ops) { > - cmd->request->rq_flags &= ~RQF_DONTPREP; > - scsi_mq_uninit_cmd(cmd); > scsi_mq_requeue_cmd(cmd); > } else { > scsi_release_buffers(cmd); > Tested-by: Damien Le Moal This patch is needed for the V2 of the series "Zoned block device support fixes" that I sent. Best regards. -- Damien Le Moal, Western Digital
[PATCH] scsi-mq: Always unprepare before requeuing a request
One of the two scsi-mq functions that requeue a request unprepares a request before requeueing (scsi_io_completion()) but the other function not (__scsi_queue_insert()). Make sure that a request is unprepared before requeuing it. Fixes: commit d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Cc: --- drivers/scsi/scsi_lib.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 4a2f705cdb14..c7514f3b444a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -44,6 +44,8 @@ static struct kmem_cache *scsi_sense_cache; static struct kmem_cache *scsi_sense_isadma_cache; static DEFINE_MUTEX(scsi_sense_cache_mutex); +static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd); + static inline struct kmem_cache * scsi_select_sense_cache(bool unchecked_isa_dma) { @@ -140,6 +142,12 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd) { struct scsi_device *sdev = cmd->device; + if (cmd->request->rq_flags & RQF_DONTPREP) { + cmd->request->rq_flags &= ~RQF_DONTPREP; + scsi_mq_uninit_cmd(cmd); + } else { + WARN_ON_ONCE(true); + } blk_mq_requeue_request(cmd->request, true); put_device(&sdev->sdev_gendev); } @@ -995,8 +1003,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) * A new command will be prepared and issued. */ if (q->mq_ops) { - cmd->request->rq_flags &= ~RQF_DONTPREP; - scsi_mq_uninit_cmd(cmd); scsi_mq_requeue_cmd(cmd); } else { scsi_release_buffers(cmd); -- 2.13.3