Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Tue, 2018-04-10 at 09:30 +0800, Ming Lei wrote: > Also is it possible to see queue freed here? I think the caller should keep a reference on the request queue. Otherwise we have a much bigger problem than a race between submitting a bio and removing a request queue from the cgroup controller in blk_cleanup_queue(). Bart.
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Mon, Apr 09, 2018 at 10:54:57PM +, Bart Van Assche wrote: > On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: > > The oops happens during generic_make_request_checks(), in > > blk_throtl_bio() exactly. > > So if we want to bypass dying queue, we have to check this before > > generic_make_request_checks(), I think. > > How about something like the patch below? > > Thanks, > > Bart. > > Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device > removal triggers a crash > > Because blkcg_exit_queue() is now called from inside blk_cleanup_queue() > it is no longer safe to access cgroup information during or after the > blk_cleanup_queue() call. Hence protect the generic_make_request_checks() > call with a blk_queue_enter() / blk_queue_exit() pair. > > --- > block/blk-core.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index d69888ff52f0..0c48bef8490f 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio) >* yet. >*/ > struct bio_list bio_list_on_stack[2]; > + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ? > + BLK_MQ_REQ_NOWAIT : 0; > + struct request_queue *q = bio->bi_disk->queue; > + bool check_result; > blk_qc_t ret = BLK_QC_T_NONE; > > - if (!generic_make_request_checks(bio)) > + if (blk_queue_enter(q, flags) < 0) { The queue pointer need to be checked before calling blk_queue_enter since the check is done in generic_make_request_checks(). Also is it possible to see queue freed here? -- Ming
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Mon, 2018-04-09 at 16:58 -0600, Jens Axboe wrote: > This ends up being nutty in the generic_make_request() case, where we > do the exact same enter/exit logic right after. That needs to get unified. > Maybe move the queue enter into generic_make_request_checks(), and exit > in the caller? Hello Jens, There is a challenge: generic_make_request() supports bio chains in which different bio's apply to different request queues and it also support bio chains in which some bio's have the flag REQ_WAIT set and others not. Is it safe to drop that support? Thanks, Bart.
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On 4/9/18 4:54 PM, Bart Van Assche wrote: > On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: >> The oops happens during generic_make_request_checks(), in >> blk_throtl_bio() exactly. >> So if we want to bypass dying queue, we have to check this before >> generic_make_request_checks(), I think. > > How about something like the patch below? > > Thanks, > > Bart. > > Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device > removal triggers a crash > > Because blkcg_exit_queue() is now called from inside blk_cleanup_queue() > it is no longer safe to access cgroup information during or after the > blk_cleanup_queue() call. Hence protect the generic_make_request_checks() > call with a blk_queue_enter() / blk_queue_exit() pair. > > --- > block/blk-core.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index d69888ff52f0..0c48bef8490f 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio) >* yet. >*/ > struct bio_list bio_list_on_stack[2]; > + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ? > + BLK_MQ_REQ_NOWAIT : 0; > + struct request_queue *q = bio->bi_disk->queue; > + bool check_result; > blk_qc_t ret = BLK_QC_T_NONE; > > - if (!generic_make_request_checks(bio)) > + if (blk_queue_enter(q, flags) < 0) { > + if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > + bio_wouldblock_error(bio); > + else > + bio_io_error(bio); > + return ret; > + } > + > + check_result = generic_make_request_checks(bio); > + blk_queue_exit(q); This ends up being nutty in the generic_make_request() case, where we do the exact same enter/exit logic right after. That needs to get unified. Maybe move the queue enter into generic_make_request_checks(), and exit in the caller? -- Jens Axboe
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: > The oops happens during generic_make_request_checks(), in > blk_throtl_bio() exactly. > So if we want to bypass dying queue, we have to check this before > generic_make_request_checks(), I think. How about something like the patch below? Thanks, Bart. Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash Because blkcg_exit_queue() is now called from inside blk_cleanup_queue() it is no longer safe to access cgroup information during or after the blk_cleanup_queue() call. Hence protect the generic_make_request_checks() call with a blk_queue_enter() / blk_queue_exit() pair. --- block/blk-core.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index d69888ff52f0..0c48bef8490f 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio) * yet. */ struct bio_list bio_list_on_stack[2]; + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ? + BLK_MQ_REQ_NOWAIT : 0; + struct request_queue *q = bio->bi_disk->queue; + bool check_result; blk_qc_t ret = BLK_QC_T_NONE; - if (!generic_make_request_checks(bio)) + if (blk_queue_enter(q, flags) < 0) { + if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) + bio_wouldblock_error(bio); + else + bio_io_error(bio); + return ret; + } + + check_result = generic_make_request_checks(bio); + blk_queue_exit(q); + + if (!check_result) goto out; /* -- 2.16.2
Re: [block regression] kernel oops triggered by removing scsi device dring IO
Hi Bart, On 18/4/9 12:47, Bart Van Assche wrote: > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: >> The following kernel oops is triggered by 'removing scsi device' during >> heavy IO. > > Is the below patch sufficient to fix this? > > Thanks, > > Bart. > > > Subject: blk-mq: Avoid that submitting a bio concurrently with device removal > triggers a crash > > Because blkcg_exit_queue() is now called from inside blk_cleanup_queue() > it is no longer safe to access cgroup information during or after the > blk_cleanup_queue() call. Hence check earlier in generic_make_request() > whether the queue has been marked as "dying". The oops happens during generic_make_request_checks(), in blk_throtl_bio() exactly. So if we want to bypass dying queue, we have to check this before generic_make_request_checks(), I think. Thanks, Joseph > --- > block/blk-core.c | 72 > +--- > 1 file changed, 37 insertions(+), 35 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index aa8c99fae527..3ac9dd25e04e 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2385,10 +2385,21 @@ blk_qc_t generic_make_request(struct bio *bio) >* yet. >*/ > struct bio_list bio_list_on_stack[2]; > + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ? > + BLK_MQ_REQ_NOWAIT : 0; > + struct request_queue *q = bio->bi_disk->queue; > blk_qc_t ret = BLK_QC_T_NONE; > > if (!generic_make_request_checks(bio)) > - goto out; > + return ret; > + > + if (blk_queue_enter(q, flags) < 0) { > + if (unlikely(!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))) > + bio_wouldblock_error(bio); > + else > + bio_io_error(bio); > + return ret; > + } > > /* >* We only want one ->make_request_fn to be active at a time, else > @@ -2423,46 +2434,37 @@ blk_qc_t generic_make_request(struct bio *bio) > bio_list_init(&bio_list_on_stack[0]); > current->bio_list = bio_list_on_stack; > do { > - struct request_queue *q = bio->bi_disk->queue; > - blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ? > - BLK_MQ_REQ_NOWAIT : 0; > - > - if (likely(blk_queue_enter(q, flags) == 0)) { > - struct bio_list lower, same; > - > - /* Create a fresh bio_list for all subordinate requests > */ > - bio_list_on_stack[1] = bio_list_on_stack[0]; > - bio_list_init(&bio_list_on_stack[0]); > - ret = q->make_request_fn(q, bio); > - > - blk_queue_exit(q); > - > - /* sort new bios into those for a lower level > - * and those for the same level > - */ > - bio_list_init(&lower); > - bio_list_init(&same); > - while ((bio = bio_list_pop(&bio_list_on_stack[0])) != > NULL) > - if (q == bio->bi_disk->queue) > - bio_list_add(&same, bio); > - else > - bio_list_add(&lower, bio); > - /* now assemble so we handle the lowest level first */ > - bio_list_merge(&bio_list_on_stack[0], &lower); > - bio_list_merge(&bio_list_on_stack[0], &same); > - bio_list_merge(&bio_list_on_stack[0], > &bio_list_on_stack[1]); > - } else { > - if (unlikely(!blk_queue_dying(q) && > - (bio->bi_opf & REQ_NOWAIT))) > - bio_wouldblock_error(bio); > + struct bio_list lower, same; > + > + WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT) && > + (bio->bi_opf & REQ_NOWAIT)); > + WARN_ON_ONCE(q != bio->bi_disk->queue); > + q = bio->bi_disk->queue; > + /* Create a fresh bio_list for all subordinate requests */ > + bio_list_on_stack[1] = bio_list_on_stack[0]; > + bio_list_init(&bio_list_on_stack[0]); > + ret = q->make_request_fn(q, bio); > + > + /* sort new bios into those for a lower level > + * and those for the same level > + */ > + bio_list_init(&lower); > + bio_list_init(&same); > + while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) > + if (q == bio->bi_disk->queue) > + bio_list_add(&same, bio); > else > - bio_io_error(bio); > - } > + bio_list_add(&lower, bio); > + /* now assemble so we handle the lowest level first */ > +
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: > The following kernel oops is triggered by 'removing scsi device' during > heavy IO. Is the below patch sufficient to fix this? Thanks, Bart. Subject: blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash Because blkcg_exit_queue() is now called from inside blk_cleanup_queue() it is no longer safe to access cgroup information during or after the blk_cleanup_queue() call. Hence check earlier in generic_make_request() whether the queue has been marked as "dying". --- block/blk-core.c | 72 +--- 1 file changed, 37 insertions(+), 35 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index aa8c99fae527..3ac9dd25e04e 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2385,10 +2385,21 @@ blk_qc_t generic_make_request(struct bio *bio) * yet. */ struct bio_list bio_list_on_stack[2]; + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ? + BLK_MQ_REQ_NOWAIT : 0; + struct request_queue *q = bio->bi_disk->queue; blk_qc_t ret = BLK_QC_T_NONE; if (!generic_make_request_checks(bio)) - goto out; + return ret; + + if (blk_queue_enter(q, flags) < 0) { + if (unlikely(!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))) + bio_wouldblock_error(bio); + else + bio_io_error(bio); + return ret; + } /* * We only want one ->make_request_fn to be active at a time, else @@ -2423,46 +2434,37 @@ blk_qc_t generic_make_request(struct bio *bio) bio_list_init(&bio_list_on_stack[0]); current->bio_list = bio_list_on_stack; do { - struct request_queue *q = bio->bi_disk->queue; - blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ? - BLK_MQ_REQ_NOWAIT : 0; - - if (likely(blk_queue_enter(q, flags) == 0)) { - struct bio_list lower, same; - - /* Create a fresh bio_list for all subordinate requests */ - bio_list_on_stack[1] = bio_list_on_stack[0]; - bio_list_init(&bio_list_on_stack[0]); - ret = q->make_request_fn(q, bio); - - blk_queue_exit(q); - - /* sort new bios into those for a lower level -* and those for the same level -*/ - bio_list_init(&lower); - bio_list_init(&same); - while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) - if (q == bio->bi_disk->queue) - bio_list_add(&same, bio); - else - bio_list_add(&lower, bio); - /* now assemble so we handle the lowest level first */ - bio_list_merge(&bio_list_on_stack[0], &lower); - bio_list_merge(&bio_list_on_stack[0], &same); - bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); - } else { - if (unlikely(!blk_queue_dying(q) && - (bio->bi_opf & REQ_NOWAIT))) - bio_wouldblock_error(bio); + struct bio_list lower, same; + + WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT) && +(bio->bi_opf & REQ_NOWAIT)); + WARN_ON_ONCE(q != bio->bi_disk->queue); + q = bio->bi_disk->queue; + /* Create a fresh bio_list for all subordinate requests */ + bio_list_on_stack[1] = bio_list_on_stack[0]; + bio_list_init(&bio_list_on_stack[0]); + ret = q->make_request_fn(q, bio); + + /* sort new bios into those for a lower level +* and those for the same level +*/ + bio_list_init(&lower); + bio_list_init(&same); + while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) + if (q == bio->bi_disk->queue) + bio_list_add(&same, bio); else - bio_io_error(bio); - } + bio_list_add(&lower, bio); + /* now assemble so we handle the lowest level first */ + bio_list_merge(&bio_list_on_stack[0], &lower); + bio_list_merge(&bio_list_on_stack[0], &same); + bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); bio = bio_list_pop(&bio_list_on_stack[0]); } while (bio); current->bio_list = NULL; /* deactivate */ out: +
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Mon, Apr 09, 2018 at 09:33:08AM +0800, Joseph Qi wrote: > Hi Bart, > > On 18/4/8 22:50, Bart Van Assche wrote: > > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: > >> The following kernel oops is triggered by 'removing scsi device' during > >> heavy IO. > > > > How did you trigger this oops? > > > > I can reproduce this oops by the following steps: > 1) start a fio job with buffered write; > 2) remove the scsi device fio write to: > echo "scsi remove-single-device ${dev}" > /proc/scsi/scsi Yeah, it can be reproduced easily, and I usually remove scsi device via 'echo 1 > /sys/block/sda/device/delete' Thanks, Ming
Re: [block regression] kernel oops triggered by removing scsi device dring IO
Hi Bart, On 18/4/8 22:50, Bart Van Assche wrote: > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: >> The following kernel oops is triggered by 'removing scsi device' during >> heavy IO. > > How did you trigger this oops? > I can reproduce this oops by the following steps: 1) start a fio job with buffered write; 2) remove the scsi device fio write to: echo "scsi remove-single-device ${dev}" > /proc/scsi/scsi
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Sun, 2018-04-08 at 16:11 +0800, Joseph Qi wrote: > This is because scsi_remove_device() will call blk_cleanup_queue(), and > then all blkgs have been destroyed and root_blkg is NULL. > Thus tg is NULL and trigger NULL pointer dereference when get td from > tg (tg->td). > It seems that we cannot simply move blkcg_exit_queue() up to > blk_cleanup_queue(). Had you considered to add a blk_queue_enter() / blk_queue_exit() pair in generic_make_request()? blk_queue_enter() namely checks the DYING flag. Thanks, Bart.
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: > The following kernel oops is triggered by 'removing scsi device' during > heavy IO. How did you trigger this oops? Bart.
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Sun, Apr 08, 2018 at 05:25:42PM +0800, Ming Lei wrote: > On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote: > > This is because scsi_remove_device() will call blk_cleanup_queue(), and > > then all blkgs have been destroyed and root_blkg is NULL. > > Thus tg is NULL and trigger NULL pointer dereference when get td from > > tg (tg->td). > > It seems that we cannot simply move blkcg_exit_queue() up to > > blk_cleanup_queue(). > > Maybe one per-queue blkcg should be introduced, which seems reasonable > too. Sorry, I mean one per-queue blkcg lock. -- Ming
Re: [block regression] kernel oops triggered by removing scsi device dring IO
On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote: > This is because scsi_remove_device() will call blk_cleanup_queue(), and > then all blkgs have been destroyed and root_blkg is NULL. > Thus tg is NULL and trigger NULL pointer dereference when get td from > tg (tg->td). > It seems that we cannot simply move blkcg_exit_queue() up to > blk_cleanup_queue(). Maybe one per-queue blkcg should be introduced, which seems reasonable too. Thanks, Ming
Re: [block regression] kernel oops triggered by removing scsi device dring IO
This is because scsi_remove_device() will call blk_cleanup_queue(), and then all blkgs have been destroyed and root_blkg is NULL. Thus tg is NULL and trigger NULL pointer dereference when get td from tg (tg->td). It seems that we cannot simply move blkcg_exit_queue() up to blk_cleanup_queue(). Thanks, Joseph On 18/4/8 12:21, Ming Lei wrote: > Hi, > > The following kernel oops is triggered by 'removing scsi device' during > heavy IO. > > 'git bisect' shows that commit a063057d7c731cffa7d10740(block: Fix a race > between request queue removal and the block cgroup controller) > introduced this regression: > > [ 42.268257] BUG: unable to handle kernel NULL pointer dereference at > 0028 > [ 42.269339] PGD 26bd9f067 P4D 26bd9f067 PUD 26bfec067 PMD 0 > [ 42.270077] Oops: [#1] PREEMPT SMP NOPTI > [ 42.270681] Dumping ftrace buffer: > [ 42.271141](ftrace buffer empty) > [ 42.271641] Modules linked in: scsi_debug iTCO_wdt iTCO_vendor_support > crc32c_intel i2c_i801 i2c_core lpc_ich mfd_core usb_storage nvme shpchp > nvme_core virtio_scsi qemu_fw_cfg ip_tables > [ 42.273770] CPU: 5 PID: 1076 Comm: fio Not tainted 4.16.0+ #49 > [ 42.274530] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.10.2-2.fc27 04/01/2014 > [ 42.275634] RIP: 0010:blk_throtl_bio+0x41/0x904 > [ 42.276225] RSP: 0018:c900033cfaa0 EFLAGS: 00010246 > [ 42.276907] RAX: 8000 RBX: 8801bdcc5118 RCX: > 0001 > [ 42.277818] RDX: 8801bdcc5118 RSI: RDI: > 8802641f8870 > [ 42.278733] RBP: R08: 0001 R09: > c900033cfb94 > [ 42.279651] R10: c900033cfc00 R11: 06ea R12: > 8802641f8870 > [ 42.280567] R13: 88026f34f000 R14: R15: > 8801bdcc5118 > [ 42.281489] FS: 7fc123922d40() GS:880272f4() > knlGS: > [ 42.282525] CS: 0010 DS: ES: CR0: 80050033 > [ 42.283270] CR2: 0028 CR3: 00026d7ac004 CR4: > 007606e0 > [ 42.284194] DR0: DR1: DR2: > > [ 42.285116] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 42.286036] PKRU: 5554 > [ 42.286393] Call Trace: > [ 42.286725] ? try_to_wake_up+0x3a3/0x3c9 > [ 42.287255] ? blk_mq_hctx_notify_dead+0x135/0x135 > [ 42.287880] ? gup_pud_range+0xb5/0x7e1 > [ 42.288381] generic_make_request_checks+0x3cf/0x539 > [ 42.289027] ? gup_pgd_range+0x8e/0xaa > [ 42.289515] generic_make_request+0x38/0x25b > [ 42.290078] ? submit_bio+0x103/0x11f > [ 42.290555] submit_bio+0x103/0x11f > [ 42.291018] ? bio_iov_iter_get_pages+0xe4/0x104 > [ 42.291620] blkdev_direct_IO+0x2a3/0x3af > [ 42.292151] ? kiocb_free+0x34/0x34 > [ 42.292607] ? ___preempt_schedule+0x16/0x18 > [ 42.293168] ? preempt_schedule_common+0x4c/0x65 > [ 42.293771] ? generic_file_read_iter+0x96/0x110 > [ 42.294377] generic_file_read_iter+0x96/0x110 > [ 42.294962] aio_read+0xca/0x13b > [ 42.295388] ? preempt_count_add+0x6d/0x8c > [ 42.295926] ? aio_read_events+0x287/0x2d6 > [ 42.296460] ? do_io_submit+0x4d2/0x62c > [ 42.296964] do_io_submit+0x4d2/0x62c > [ 42.297446] ? do_syscall_64+0x9d/0x15e > [ 42.297950] do_syscall_64+0x9d/0x15e > [ 42.298431] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > [ 42.299090] RIP: 0033:0x7fc12244e687 > [ 42.299556] RSP: 002b:7ffe18388a68 EFLAGS: 0202 ORIG_RAX: > 00d1 > [ 42.300528] RAX: ffda RBX: 7fc0fde08670 RCX: > 7fc12244e687 > [ 42.301442] RDX: 01d1b388 RSI: 0001 RDI: > 7fc123782000 > [ 42.302359] RBP: 22d8 R08: 0001 R09: > 01c461e0 > [ 42.303275] R10: R11: 0202 R12: > 7fc0fde08670 > [ 42.304195] R13: R14: 01d1d0c0 R15: > 01b872f0 > [ 42.305117] Code: 48 85 f6 48 89 7c 24 10 75 0e 48 8b b7 b8 05 00 00 31 ed > 48 85 f6 74 0f 48 63 05 75 a4 e4 00 48 8b ac c6 28 02 00 00 f6 43 15 02 <48> > 8b 45 28 48 89 04 24 0f 85 28 08 00 00 8b 43 10 45 31 e4 83 > [ 42.307553] RIP: blk_throtl_bio+0x41/0x904 RSP: c900033cfaa0 > [ 42.308328] CR2: 0028 > [ 42.308920] ---[ end trace f53a144979f63b29 ]--- > [ 42.309520] Kernel panic - not syncing: Fatal exception > [ 42.310635] Dumping ftrace buffer: > [ 42.311087](ftrace buffer empty) > [ 42.311583] Kernel Offset: disabled > [ 42.312163] ---[ end Kernel panic - not syncing: Fatal exception ]--- >
[block regression] kernel oops triggered by removing scsi device dring IO
Hi, The following kernel oops is triggered by 'removing scsi device' during heavy IO. 'git bisect' shows that commit a063057d7c731cffa7d10740(block: Fix a race between request queue removal and the block cgroup controller) introduced this regression: [ 42.268257] BUG: unable to handle kernel NULL pointer dereference at 0028 [ 42.269339] PGD 26bd9f067 P4D 26bd9f067 PUD 26bfec067 PMD 0 [ 42.270077] Oops: [#1] PREEMPT SMP NOPTI [ 42.270681] Dumping ftrace buffer: [ 42.271141](ftrace buffer empty) [ 42.271641] Modules linked in: scsi_debug iTCO_wdt iTCO_vendor_support crc32c_intel i2c_i801 i2c_core lpc_ich mfd_core usb_storage nvme shpchp nvme_core virtio_scsi qemu_fw_cfg ip_tables [ 42.273770] CPU: 5 PID: 1076 Comm: fio Not tainted 4.16.0+ #49 [ 42.274530] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 42.275634] RIP: 0010:blk_throtl_bio+0x41/0x904 [ 42.276225] RSP: 0018:c900033cfaa0 EFLAGS: 00010246 [ 42.276907] RAX: 8000 RBX: 8801bdcc5118 RCX: 0001 [ 42.277818] RDX: 8801bdcc5118 RSI: RDI: 8802641f8870 [ 42.278733] RBP: R08: 0001 R09: c900033cfb94 [ 42.279651] R10: c900033cfc00 R11: 06ea R12: 8802641f8870 [ 42.280567] R13: 88026f34f000 R14: R15: 8801bdcc5118 [ 42.281489] FS: 7fc123922d40() GS:880272f4() knlGS: [ 42.282525] CS: 0010 DS: ES: CR0: 80050033 [ 42.283270] CR2: 0028 CR3: 00026d7ac004 CR4: 007606e0 [ 42.284194] DR0: DR1: DR2: [ 42.285116] DR3: DR6: fffe0ff0 DR7: 0400 [ 42.286036] PKRU: 5554 [ 42.286393] Call Trace: [ 42.286725] ? try_to_wake_up+0x3a3/0x3c9 [ 42.287255] ? blk_mq_hctx_notify_dead+0x135/0x135 [ 42.287880] ? gup_pud_range+0xb5/0x7e1 [ 42.288381] generic_make_request_checks+0x3cf/0x539 [ 42.289027] ? gup_pgd_range+0x8e/0xaa [ 42.289515] generic_make_request+0x38/0x25b [ 42.290078] ? submit_bio+0x103/0x11f [ 42.290555] submit_bio+0x103/0x11f [ 42.291018] ? bio_iov_iter_get_pages+0xe4/0x104 [ 42.291620] blkdev_direct_IO+0x2a3/0x3af [ 42.292151] ? kiocb_free+0x34/0x34 [ 42.292607] ? ___preempt_schedule+0x16/0x18 [ 42.293168] ? preempt_schedule_common+0x4c/0x65 [ 42.293771] ? generic_file_read_iter+0x96/0x110 [ 42.294377] generic_file_read_iter+0x96/0x110 [ 42.294962] aio_read+0xca/0x13b [ 42.295388] ? preempt_count_add+0x6d/0x8c [ 42.295926] ? aio_read_events+0x287/0x2d6 [ 42.296460] ? do_io_submit+0x4d2/0x62c [ 42.296964] do_io_submit+0x4d2/0x62c [ 42.297446] ? do_syscall_64+0x9d/0x15e [ 42.297950] do_syscall_64+0x9d/0x15e [ 42.298431] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 42.299090] RIP: 0033:0x7fc12244e687 [ 42.299556] RSP: 002b:7ffe18388a68 EFLAGS: 0202 ORIG_RAX: 00d1 [ 42.300528] RAX: ffda RBX: 7fc0fde08670 RCX: 7fc12244e687 [ 42.301442] RDX: 01d1b388 RSI: 0001 RDI: 7fc123782000 [ 42.302359] RBP: 22d8 R08: 0001 R09: 01c461e0 [ 42.303275] R10: R11: 0202 R12: 7fc0fde08670 [ 42.304195] R13: R14: 01d1d0c0 R15: 01b872f0 [ 42.305117] Code: 48 85 f6 48 89 7c 24 10 75 0e 48 8b b7 b8 05 00 00 31 ed 48 85 f6 74 0f 48 63 05 75 a4 e4 00 48 8b ac c6 28 02 00 00 f6 43 15 02 <48> 8b 45 28 48 89 04 24 0f 85 28 08 00 00 8b 43 10 45 31 e4 83 [ 42.307553] RIP: blk_throtl_bio+0x41/0x904 RSP: c900033cfaa0 [ 42.308328] CR2: 0028 [ 42.308920] ---[ end trace f53a144979f63b29 ]--- [ 42.309520] Kernel panic - not syncing: Fatal exception [ 42.310635] Dumping ftrace buffer: [ 42.311087](ftrace buffer empty) [ 42.311583] Kernel Offset: disabled [ 42.312163] ---[ end Kernel panic - not syncing: Fatal exception ]--- -- Ming