subject:"\[block regression\] kernel oops triggered by removing scsi device dring IO"

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche

On Tue, 2018-04-10 at 09:30 +0800, Ming Lei wrote:
> Also is it possible to see queue freed here?

I think the caller should keep a reference on the request queue. Otherwise
we have a much bigger problem than a race between submitting a bio and
removing a request queue from the cgroup controller in blk_cleanup_queue().

Bart.

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Ming Lei

On Mon, Apr 09, 2018 at 10:54:57PM +, Bart Van Assche wrote:
> On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
> > The oops happens during generic_make_request_checks(), in
> > blk_throtl_bio() exactly.
> > So if we want to bypass dying queue, we have to check this before
> > generic_make_request_checks(), I think.
> 
> How about something like the patch below?
> 
> Thanks,
> 
> Bart.
> 
> Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device
>  removal triggers a crash
> 
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
> call with a blk_queue_enter() / blk_queue_exit() pair.
> 
> ---
>  block/blk-core.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d69888ff52f0..0c48bef8490f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio)
>* yet.
>*/
>   struct bio_list bio_list_on_stack[2];
> + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
> + BLK_MQ_REQ_NOWAIT : 0;
> + struct request_queue *q = bio->bi_disk->queue;
> + bool check_result;
>   blk_qc_t ret = BLK_QC_T_NONE;
>  
> - if (!generic_make_request_checks(bio))
> + if (blk_queue_enter(q, flags) < 0) {

The queue pointer need to be checked before calling blk_queue_enter
since the check is done in generic_make_request_checks().

Also is it possible to see queue freed here?

-- 
Ming

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche

On Mon, 2018-04-09 at 16:58 -0600, Jens Axboe wrote:
> This ends up being nutty in the generic_make_request() case, where we
> do the exact same enter/exit logic right after. That needs to get unified.
> Maybe move the queue enter into generic_make_request_checks(), and exit
> in the caller?

Hello Jens,

There is a challenge: generic_make_request() supports bio chains in which
different bio's apply to different request queues and it also support bio
chains in which some bio's have the flag REQ_WAIT set and others not. Is
it safe to drop that support?

Thanks,

Bart.

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Jens Axboe

On 4/9/18 4:54 PM, Bart Van Assche wrote:
> On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
>> The oops happens during generic_make_request_checks(), in
>> blk_throtl_bio() exactly.
>> So if we want to bypass dying queue, we have to check this before
>> generic_make_request_checks(), I think.
> 
> How about something like the patch below?
> 
> Thanks,
> 
> Bart.
> 
> Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device
>  removal triggers a crash
> 
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
> call with a blk_queue_enter() / blk_queue_exit() pair.
> 
> ---
>  block/blk-core.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d69888ff52f0..0c48bef8490f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio)
>* yet.
>*/
>   struct bio_list bio_list_on_stack[2];
> + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
> + BLK_MQ_REQ_NOWAIT : 0;
> + struct request_queue *q = bio->bi_disk->queue;
> + bool check_result;
>   blk_qc_t ret = BLK_QC_T_NONE;
>  
> - if (!generic_make_request_checks(bio))
> + if (blk_queue_enter(q, flags) < 0) {
> + if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))
> + bio_wouldblock_error(bio);
> + else
> + bio_io_error(bio);
> + return ret;
> + }
> +
> + check_result = generic_make_request_checks(bio);
> + blk_queue_exit(q);

This ends up being nutty in the generic_make_request() case, where we
do the exact same enter/exit logic right after. That needs to get unified.
Maybe move the queue enter into generic_make_request_checks(), and exit
in the caller?

-- 
Jens Axboe

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche

On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
> The oops happens during generic_make_request_checks(), in
> blk_throtl_bio() exactly.
> So if we want to bypass dying queue, we have to check this before
> generic_make_request_checks(), I think.

How about something like the patch below?

Thanks,

Bart.

Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device
 removal triggers a crash

Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
it is no longer safe to access cgroup information during or after the
blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
call with a blk_queue_enter() / blk_queue_exit() pair.

---
 block/blk-core.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d69888ff52f0..0c48bef8490f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio)
 * yet.
 */
struct bio_list bio_list_on_stack[2];
+   blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
+   BLK_MQ_REQ_NOWAIT : 0;
+   struct request_queue *q = bio->bi_disk->queue;
+   bool check_result;
blk_qc_t ret = BLK_QC_T_NONE;
 
-   if (!generic_make_request_checks(bio))
+   if (blk_queue_enter(q, flags) < 0) {
+   if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))
+   bio_wouldblock_error(bio);
+   else
+   bio_io_error(bio);
+   return ret;
+   }
+
+   check_result = generic_make_request_checks(bio);
+   blk_queue_exit(q);
+
+   if (!check_result)
goto out;
 
/*
-- 
2.16.2

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Joseph Qi

Hi Bart,

On 18/4/9 12:47, Bart Van Assche wrote:
> On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
>> The following kernel oops is triggered by 'removing scsi device' during
>> heavy IO.
> 
> Is the below patch sufficient to fix this?
> 
> Thanks,
> 
> Bart.
> 
> 
> Subject: blk-mq: Avoid that submitting a bio concurrently with device removal 
> triggers a crash
> 
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence check earlier in generic_make_request()
> whether the queue has been marked as "dying".

The oops happens during generic_make_request_checks(), in
blk_throtl_bio() exactly.
So if we want to bypass dying queue, we have to check this before
generic_make_request_checks(), I think.

Thanks,
Joseph

> ---
>  block/blk-core.c | 72 
> +---
>  1 file changed, 37 insertions(+), 35 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index aa8c99fae527..3ac9dd25e04e 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2385,10 +2385,21 @@ blk_qc_t generic_make_request(struct bio *bio)
>* yet.
>*/
>   struct bio_list bio_list_on_stack[2];
> + blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
> + BLK_MQ_REQ_NOWAIT : 0;
> + struct request_queue *q = bio->bi_disk->queue;
>   blk_qc_t ret = BLK_QC_T_NONE;
>  
>   if (!generic_make_request_checks(bio))
> - goto out;
> + return ret;
> +
> + if (blk_queue_enter(q, flags) < 0) {
> + if (unlikely(!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)))
> + bio_wouldblock_error(bio);
> + else
> + bio_io_error(bio);
> + return ret;
> + }
>  
>   /*
>* We only want one ->make_request_fn to be active at a time, else
> @@ -2423,46 +2434,37 @@ blk_qc_t generic_make_request(struct bio *bio)
>   bio_list_init(&bio_list_on_stack[0]);
>   current->bio_list = bio_list_on_stack;
>   do {
> - struct request_queue *q = bio->bi_disk->queue;
> - blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
> - BLK_MQ_REQ_NOWAIT : 0;
> -
> - if (likely(blk_queue_enter(q, flags) == 0)) {
> - struct bio_list lower, same;
> -
> - /* Create a fresh bio_list for all subordinate requests 
> */
> - bio_list_on_stack[1] = bio_list_on_stack[0];
> - bio_list_init(&bio_list_on_stack[0]);
> - ret = q->make_request_fn(q, bio);
> -
> - blk_queue_exit(q);
> -
> - /* sort new bios into those for a lower level
> -  * and those for the same level
> -  */
> - bio_list_init(&lower);
> - bio_list_init(&same);
> - while ((bio = bio_list_pop(&bio_list_on_stack[0])) != 
> NULL)
> - if (q == bio->bi_disk->queue)
> - bio_list_add(&same, bio);
> - else
> - bio_list_add(&lower, bio);
> - /* now assemble so we handle the lowest level first */
> - bio_list_merge(&bio_list_on_stack[0], &lower);
> - bio_list_merge(&bio_list_on_stack[0], &same);
> - bio_list_merge(&bio_list_on_stack[0], 
> &bio_list_on_stack[1]);
> - } else {
> - if (unlikely(!blk_queue_dying(q) &&
> - (bio->bi_opf & REQ_NOWAIT)))
> - bio_wouldblock_error(bio);
> + struct bio_list lower, same;
> +
> + WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT) &&
> +  (bio->bi_opf & REQ_NOWAIT));
> + WARN_ON_ONCE(q != bio->bi_disk->queue);
> + q = bio->bi_disk->queue;
> + /* Create a fresh bio_list for all subordinate requests */
> + bio_list_on_stack[1] = bio_list_on_stack[0];
> + bio_list_init(&bio_list_on_stack[0]);
> + ret = q->make_request_fn(q, bio);
> +
> + /* sort new bios into those for a lower level
> +  * and those for the same level
> +  */
> + bio_list_init(&lower);
> + bio_list_init(&same);
> + while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
> + if (q == bio->bi_disk->queue)
> + bio_list_add(&same, bio);
>   else
> - bio_io_error(bio);
> - }
> + bio_list_add(&lower, bio);
> + /* now assemble so we handle the lowest level first */
> +

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Bart Van Assche

On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> The following kernel oops is triggered by 'removing scsi device' during
> heavy IO.

Is the below patch sufficient to fix this?

Thanks,

Bart.


Subject: blk-mq: Avoid that submitting a bio concurrently with device removal 
triggers a crash

Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
it is no longer safe to access cgroup information during or after the
blk_cleanup_queue() call. Hence check earlier in generic_make_request()
whether the queue has been marked as "dying".
---
 block/blk-core.c | 72 +---
 1 file changed, 37 insertions(+), 35 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index aa8c99fae527..3ac9dd25e04e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2385,10 +2385,21 @@ blk_qc_t generic_make_request(struct bio *bio)
 * yet.
 */
struct bio_list bio_list_on_stack[2];
+   blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
+   BLK_MQ_REQ_NOWAIT : 0;
+   struct request_queue *q = bio->bi_disk->queue;
blk_qc_t ret = BLK_QC_T_NONE;
 
if (!generic_make_request_checks(bio))
-   goto out;
+   return ret;
+
+   if (blk_queue_enter(q, flags) < 0) {
+   if (unlikely(!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)))
+   bio_wouldblock_error(bio);
+   else
+   bio_io_error(bio);
+   return ret;
+   }
 
/*
 * We only want one ->make_request_fn to be active at a time, else
@@ -2423,46 +2434,37 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_init(&bio_list_on_stack[0]);
current->bio_list = bio_list_on_stack;
do {
-   struct request_queue *q = bio->bi_disk->queue;
-   blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
-   BLK_MQ_REQ_NOWAIT : 0;
-
-   if (likely(blk_queue_enter(q, flags) == 0)) {
-   struct bio_list lower, same;
-
-   /* Create a fresh bio_list for all subordinate requests 
*/
-   bio_list_on_stack[1] = bio_list_on_stack[0];
-   bio_list_init(&bio_list_on_stack[0]);
-   ret = q->make_request_fn(q, bio);
-
-   blk_queue_exit(q);
-
-   /* sort new bios into those for a lower level
-* and those for the same level
-*/
-   bio_list_init(&lower);
-   bio_list_init(&same);
-   while ((bio = bio_list_pop(&bio_list_on_stack[0])) != 
NULL)
-   if (q == bio->bi_disk->queue)
-   bio_list_add(&same, bio);
-   else
-   bio_list_add(&lower, bio);
-   /* now assemble so we handle the lowest level first */
-   bio_list_merge(&bio_list_on_stack[0], &lower);
-   bio_list_merge(&bio_list_on_stack[0], &same);
-   bio_list_merge(&bio_list_on_stack[0], 
&bio_list_on_stack[1]);
-   } else {
-   if (unlikely(!blk_queue_dying(q) &&
-   (bio->bi_opf & REQ_NOWAIT)))
-   bio_wouldblock_error(bio);
+   struct bio_list lower, same;
+
+   WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT) &&
+(bio->bi_opf & REQ_NOWAIT));
+   WARN_ON_ONCE(q != bio->bi_disk->queue);
+   q = bio->bi_disk->queue;
+   /* Create a fresh bio_list for all subordinate requests */
+   bio_list_on_stack[1] = bio_list_on_stack[0];
+   bio_list_init(&bio_list_on_stack[0]);
+   ret = q->make_request_fn(q, bio);
+
+   /* sort new bios into those for a lower level
+* and those for the same level
+*/
+   bio_list_init(&lower);
+   bio_list_init(&same);
+   while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
+   if (q == bio->bi_disk->queue)
+   bio_list_add(&same, bio);
else
-   bio_io_error(bio);
-   }
+   bio_list_add(&lower, bio);
+   /* now assemble so we handle the lowest level first */
+   bio_list_merge(&bio_list_on_stack[0], &lower);
+   bio_list_merge(&bio_list_on_stack[0], &same);
+   bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
bio = bio_list_pop(&bio_list_on_stack[0]);
} while (bio);
current->bio_list = NULL; /* deactivate */
 
 out:
+

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Ming Lei

On Mon, Apr 09, 2018 at 09:33:08AM +0800, Joseph Qi wrote:
> Hi Bart,
> 
> On 18/4/8 22:50, Bart Van Assche wrote:
> > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> >> The following kernel oops is triggered by 'removing scsi device' during
> >> heavy IO.
> > 
> > How did you trigger this oops?
> > 
> 
> I can reproduce this oops by the following steps:
> 1) start a fio job with buffered write;
> 2) remove the scsi device fio write to:
> echo "scsi remove-single-device ${dev}" > /proc/scsi/scsi

Yeah, it can be reproduced easily, and I usually remove scsi
device via 'echo 1 > /sys/block/sda/device/delete'

Thanks,
Ming

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Joseph Qi

Hi Bart,

On 18/4/8 22:50, Bart Van Assche wrote:
> On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
>> The following kernel oops is triggered by 'removing scsi device' during
>> heavy IO.
> 
> How did you trigger this oops?
> 

I can reproduce this oops by the following steps:
1) start a fio job with buffered write;
2) remove the scsi device fio write to:
echo "scsi remove-single-device ${dev}" > /proc/scsi/scsi

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Bart Van Assche

On Sun, 2018-04-08 at 16:11 +0800, Joseph Qi wrote:
> This is because scsi_remove_device() will call blk_cleanup_queue(), and
> then all blkgs have been destroyed and root_blkg is NULL.
> Thus tg is NULL and trigger NULL pointer dereference when get td from
> tg (tg->td).
> It seems that we cannot simply move blkcg_exit_queue() up to
> blk_cleanup_queue().

Had you considered to add a blk_queue_enter() / blk_queue_exit() pair in
generic_make_request()? blk_queue_enter() namely checks the DYING flag.

Thanks,

Bart.

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Bart Van Assche

On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> The following kernel oops is triggered by 'removing scsi device' during
> heavy IO.

How did you trigger this oops?

Bart.

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Ming Lei

On Sun, Apr 08, 2018 at 05:25:42PM +0800, Ming Lei wrote:
> On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote:
> > This is because scsi_remove_device() will call blk_cleanup_queue(), and
> > then all blkgs have been destroyed and root_blkg is NULL.
> > Thus tg is NULL and trigger NULL pointer dereference when get td from
> > tg (tg->td).
> > It seems that we cannot simply move blkcg_exit_queue() up to
> > blk_cleanup_queue().
> 
> Maybe one per-queue blkcg should be introduced, which seems reasonable
> too.

Sorry, I mean one per-queue blkcg lock.

-- 
Ming

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Ming Lei

On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote:
> This is because scsi_remove_device() will call blk_cleanup_queue(), and
> then all blkgs have been destroyed and root_blkg is NULL.
> Thus tg is NULL and trigger NULL pointer dereference when get td from
> tg (tg->td).
> It seems that we cannot simply move blkcg_exit_queue() up to
> blk_cleanup_queue().

Maybe one per-queue blkcg should be introduced, which seems reasonable
too.

Thanks,
Ming

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Joseph Qi

This is because scsi_remove_device() will call blk_cleanup_queue(), and
then all blkgs have been destroyed and root_blkg is NULL.
Thus tg is NULL and trigger NULL pointer dereference when get td from
tg (tg->td).
It seems that we cannot simply move blkcg_exit_queue() up to
blk_cleanup_queue().

Thanks,
Joseph

On 18/4/8 12:21, Ming Lei wrote:
> Hi,
> 
> The following kernel oops is triggered by 'removing scsi device' during
> heavy IO.
> 
> 'git bisect' shows that commit a063057d7c731cffa7d10740(block: Fix a race
> between request queue removal and the block cgroup controller)
> introduced this regression:
> 
> [   42.268257] BUG: unable to handle kernel NULL pointer dereference at 
> 0028
> [   42.269339] PGD 26bd9f067 P4D 26bd9f067 PUD 26bfec067 PMD 0 
> [   42.270077] Oops:  [#1] PREEMPT SMP NOPTI
> [   42.270681] Dumping ftrace buffer:
> [   42.271141](ftrace buffer empty)
> [   42.271641] Modules linked in: scsi_debug iTCO_wdt iTCO_vendor_support 
> crc32c_intel i2c_i801 i2c_core lpc_ich mfd_core usb_storage nvme shpchp 
> nvme_core virtio_scsi qemu_fw_cfg ip_tables
> [   42.273770] CPU: 5 PID: 1076 Comm: fio Not tainted 4.16.0+ #49
> [   42.274530] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.10.2-2.fc27 04/01/2014
> [   42.275634] RIP: 0010:blk_throtl_bio+0x41/0x904
> [   42.276225] RSP: 0018:c900033cfaa0 EFLAGS: 00010246
> [   42.276907] RAX: 8000 RBX: 8801bdcc5118 RCX: 
> 0001
> [   42.277818] RDX: 8801bdcc5118 RSI:  RDI: 
> 8802641f8870
> [   42.278733] RBP:  R08: 0001 R09: 
> c900033cfb94
> [   42.279651] R10: c900033cfc00 R11: 06ea R12: 
> 8802641f8870
> [   42.280567] R13: 88026f34f000 R14:  R15: 
> 8801bdcc5118
> [   42.281489] FS:  7fc123922d40() GS:880272f4() 
> knlGS:
> [   42.282525] CS:  0010 DS:  ES:  CR0: 80050033
> [   42.283270] CR2: 0028 CR3: 00026d7ac004 CR4: 
> 007606e0
> [   42.284194] DR0:  DR1:  DR2: 
> 
> [   42.285116] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [   42.286036] PKRU: 5554
> [   42.286393] Call Trace:
> [   42.286725]  ? try_to_wake_up+0x3a3/0x3c9
> [   42.287255]  ? blk_mq_hctx_notify_dead+0x135/0x135
> [   42.287880]  ? gup_pud_range+0xb5/0x7e1
> [   42.288381]  generic_make_request_checks+0x3cf/0x539
> [   42.289027]  ? gup_pgd_range+0x8e/0xaa
> [   42.289515]  generic_make_request+0x38/0x25b
> [   42.290078]  ? submit_bio+0x103/0x11f
> [   42.290555]  submit_bio+0x103/0x11f
> [   42.291018]  ? bio_iov_iter_get_pages+0xe4/0x104
> [   42.291620]  blkdev_direct_IO+0x2a3/0x3af
> [   42.292151]  ? kiocb_free+0x34/0x34
> [   42.292607]  ? ___preempt_schedule+0x16/0x18
> [   42.293168]  ? preempt_schedule_common+0x4c/0x65
> [   42.293771]  ? generic_file_read_iter+0x96/0x110
> [   42.294377]  generic_file_read_iter+0x96/0x110
> [   42.294962]  aio_read+0xca/0x13b
> [   42.295388]  ? preempt_count_add+0x6d/0x8c
> [   42.295926]  ? aio_read_events+0x287/0x2d6
> [   42.296460]  ? do_io_submit+0x4d2/0x62c
> [   42.296964]  do_io_submit+0x4d2/0x62c
> [   42.297446]  ? do_syscall_64+0x9d/0x15e
> [   42.297950]  do_syscall_64+0x9d/0x15e
> [   42.298431]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [   42.299090] RIP: 0033:0x7fc12244e687
> [   42.299556] RSP: 002b:7ffe18388a68 EFLAGS: 0202 ORIG_RAX: 
> 00d1
> [   42.300528] RAX: ffda RBX: 7fc0fde08670 RCX: 
> 7fc12244e687
> [   42.301442] RDX: 01d1b388 RSI: 0001 RDI: 
> 7fc123782000
> [   42.302359] RBP: 22d8 R08: 0001 R09: 
> 01c461e0
> [   42.303275] R10:  R11: 0202 R12: 
> 7fc0fde08670
> [   42.304195] R13:  R14: 01d1d0c0 R15: 
> 01b872f0
> [   42.305117] Code: 48 85 f6 48 89 7c 24 10 75 0e 48 8b b7 b8 05 00 00 31 ed 
> 48 85 f6 74 0f 48 63 05 75 a4 e4 00 48 8b ac c6 28 02 00 00 f6 43 15 02 <48> 
> 8b 45 28 48 89 04 24 0f 85 28 08 00 00 8b 43 10 45 31 e4 83 
> [   42.307553] RIP: blk_throtl_bio+0x41/0x904 RSP: c900033cfaa0
> [   42.308328] CR2: 0028
> [   42.308920] ---[ end trace f53a144979f63b29 ]---
> [   42.309520] Kernel panic - not syncing: Fatal exception
> [   42.310635] Dumping ftrace buffer:
> [   42.311087](ftrace buffer empty)
> [   42.311583] Kernel Offset: disabled
> [   42.312163] ---[ end Kernel panic - not syncing: Fatal exception ]---
>

[block regression] kernel oops triggered by removing scsi device dring IO

2018-04-07 Thread Ming Lei

Hi,

The following kernel oops is triggered by 'removing scsi device' during
heavy IO.

'git bisect' shows that commit a063057d7c731cffa7d10740(block: Fix a race
between request queue removal and the block cgroup controller)
introduced this regression:

[   42.268257] BUG: unable to handle kernel NULL pointer dereference at 
0028
[   42.269339] PGD 26bd9f067 P4D 26bd9f067 PUD 26bfec067 PMD 0 
[   42.270077] Oops:  [#1] PREEMPT SMP NOPTI
[   42.270681] Dumping ftrace buffer:
[   42.271141](ftrace buffer empty)
[   42.271641] Modules linked in: scsi_debug iTCO_wdt iTCO_vendor_support 
crc32c_intel i2c_i801 i2c_core lpc_ich mfd_core usb_storage nvme shpchp 
nvme_core virtio_scsi qemu_fw_cfg ip_tables
[   42.273770] CPU: 5 PID: 1076 Comm: fio Not tainted 4.16.0+ #49
[   42.274530] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.10.2-2.fc27 04/01/2014
[   42.275634] RIP: 0010:blk_throtl_bio+0x41/0x904
[   42.276225] RSP: 0018:c900033cfaa0 EFLAGS: 00010246
[   42.276907] RAX: 8000 RBX: 8801bdcc5118 RCX: 0001
[   42.277818] RDX: 8801bdcc5118 RSI:  RDI: 8802641f8870
[   42.278733] RBP:  R08: 0001 R09: c900033cfb94
[   42.279651] R10: c900033cfc00 R11: 06ea R12: 8802641f8870
[   42.280567] R13: 88026f34f000 R14:  R15: 8801bdcc5118
[   42.281489] FS:  7fc123922d40() GS:880272f4() 
knlGS:
[   42.282525] CS:  0010 DS:  ES:  CR0: 80050033
[   42.283270] CR2: 0028 CR3: 00026d7ac004 CR4: 007606e0
[   42.284194] DR0:  DR1:  DR2: 
[   42.285116] DR3:  DR6: fffe0ff0 DR7: 0400
[   42.286036] PKRU: 5554
[   42.286393] Call Trace:
[   42.286725]  ? try_to_wake_up+0x3a3/0x3c9
[   42.287255]  ? blk_mq_hctx_notify_dead+0x135/0x135
[   42.287880]  ? gup_pud_range+0xb5/0x7e1
[   42.288381]  generic_make_request_checks+0x3cf/0x539
[   42.289027]  ? gup_pgd_range+0x8e/0xaa
[   42.289515]  generic_make_request+0x38/0x25b
[   42.290078]  ? submit_bio+0x103/0x11f
[   42.290555]  submit_bio+0x103/0x11f
[   42.291018]  ? bio_iov_iter_get_pages+0xe4/0x104
[   42.291620]  blkdev_direct_IO+0x2a3/0x3af
[   42.292151]  ? kiocb_free+0x34/0x34
[   42.292607]  ? ___preempt_schedule+0x16/0x18
[   42.293168]  ? preempt_schedule_common+0x4c/0x65
[   42.293771]  ? generic_file_read_iter+0x96/0x110
[   42.294377]  generic_file_read_iter+0x96/0x110
[   42.294962]  aio_read+0xca/0x13b
[   42.295388]  ? preempt_count_add+0x6d/0x8c
[   42.295926]  ? aio_read_events+0x287/0x2d6
[   42.296460]  ? do_io_submit+0x4d2/0x62c
[   42.296964]  do_io_submit+0x4d2/0x62c
[   42.297446]  ? do_syscall_64+0x9d/0x15e
[   42.297950]  do_syscall_64+0x9d/0x15e
[   42.298431]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   42.299090] RIP: 0033:0x7fc12244e687
[   42.299556] RSP: 002b:7ffe18388a68 EFLAGS: 0202 ORIG_RAX: 
00d1
[   42.300528] RAX: ffda RBX: 7fc0fde08670 RCX: 7fc12244e687
[   42.301442] RDX: 01d1b388 RSI: 0001 RDI: 7fc123782000
[   42.302359] RBP: 22d8 R08: 0001 R09: 01c461e0
[   42.303275] R10:  R11: 0202 R12: 7fc0fde08670
[   42.304195] R13:  R14: 01d1d0c0 R15: 01b872f0
[   42.305117] Code: 48 85 f6 48 89 7c 24 10 75 0e 48 8b b7 b8 05 00 00 31 ed 
48 85 f6 74 0f 48 63 05 75 a4 e4 00 48 8b ac c6 28 02 00 00 f6 43 15 02 <48> 8b 
45 28 48 89 04 24 0f 85 28 08 00 00 8b 43 10 45 31 e4 83 
[   42.307553] RIP: blk_throtl_bio+0x41/0x904 RSP: c900033cfaa0
[   42.308328] CR2: 0028
[   42.308920] ---[ end trace f53a144979f63b29 ]---
[   42.309520] Kernel panic - not syncing: Fatal exception
[   42.310635] Dumping ftrace buffer:
[   42.311087](ftrace buffer empty)
[   42.311583] Kernel Offset: disabled
[   42.312163] ---[ end Kernel panic - not syncing: Fatal exception ]---

-- 
Ming

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

Re: [block regression] kernel oops triggered by removing scsi device dring IO

[block regression] kernel oops triggered by removing scsi device dring IO

15 matches

Site Navigation

Mail list logo

Footer information