Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Coly Li
On 2020/8/5 10:46, Ming Lei wrote:
> On Wed, Aug 05, 2020 at 09:54:00AM +0800, Coly Li wrote:
>> On 2020/8/5 07:58, Ming Lei wrote:
>>> On Tue, Aug 04, 2020 at 10:23:32PM +0800, Coly Li wrote:
 When some buggy driver doesn't set its queue->limits.discard_granularity
 (e.g. current loop device driver), discard at LBA 0 on such device will
 trigger a kernel BUG() panic from block/blk-mq.c:563.

 [  955.565006][   C39] [ cut here ]
 [  955.559660][   C39] invalid opcode:  [#1] SMP NOPTI
 [  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G  
   E 5.8.0-default+ #40
 [  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 
 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
 [  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
 [  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c 
 e9 9f ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff 
 eb cb <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
 [  955.622179][   C39] RSP: 0018:b1288701fe28 EFLAGS: 00010202
 [  955.749277][   C39] RAX: 0001 RBX: 956fffba5080 RCX: 
 4003
 [  955.749278][   C39] RDX: 0003 RSI:  RDI: 
 
 [  955.749279][   C39] RBP:  R08:  R09: 
 
 [  955.749279][   C39] R10: b1288701fd28 R11: 0001 R12: 
 a8e05160
 [  955.749280][   C39] R13: 0004 R14: 0004 R15: 
 a7ad3a1e
 [  955.749281][   C39] FS:  () 
 GS:95bfbda0() knlGS:
 [  955.749282][   C39] CS:  0010 DS:  ES:  CR0: 80050033
 [  955.749282][   C39] CR2: 7f6f0ef766a8 CR3: 005a37012002 CR4: 
 007606e0
 [  955.749283][   C39] DR0:  DR1:  DR2: 
 
 [  955.749284][   C39] DR3:  DR6: fffe0ff0 DR7: 
 0400
 [  955.749284][   C39] PKRU: 5554
 [  955.749285][   C39] Call Trace:
 [  955.749290][   C39]  blk_done_softirq+0x99/0xc0
 [  957.550669][   C39]  __do_softirq+0xd3/0x45f
 [  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
 [  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
 [  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
 [  957.550684][   C39]  run_ksoftirqd+0x30/0x60
 [  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
 [  957.886225][   C39]  ? sort_range+0x20/0x20
 [  957.886226][   C39]  kthread+0x137/0x160
 [  957.886228][   C39]  ? kthread_park+0x90/0x90
 [  957.886231][   C39]  ret_from_fork+0x22/0x30
 [  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---

 This is the procedure to reproduce the panic,
   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
   # losetup -f /dev/nvme0n1 --direct-io=on
   # blkdiscard /dev/loop0 -o 0 -l 0x200

 This is how the BUG() panic triggered by __blkdev_issue_discard(),
 - For a NVMe SSD backing loop device, the driver does not initialize
   its queue->limits.discard_granularity and leaves it to 0.
 - When discard on LBA 0 of the loop device, __blkdev_issue_discard()
   is called before loop device driver code.
 - Inside __blkdev_issue_discard(), when calculating value of
   granularity_aligned_lba by
granularity_aligned_lba = round_up(sector_mapped,
q->limits.discard_granularity >> SECTOR_SHIFT);
   because sector_mapped is 0 (at LBA 0 and no partition offset), and
   q->limits.discard_granularity is 0 (by the buggy loop driver), the
   calculated granularity_aligned_lba is 0.
 - The inline function bio_aligned_discard_max_sectors() is defined as
return round_down(UINT_MAX, q->limits.discard_granularity) >>
SECTOR_SHIFT;
when q->limits.discard_granularity is 0 from loop device driver, the
above calculation returns value 0.
 - Now granularity_aligned_lba and sctor_mapped are 0, req_sectors is
   calculated by the following lines in __blkdev_issue_discard(),
if (granularity_aligned_lba == sector_mapped)
req_sects = min_t(sector_t, nr_sects,
  bio_aligned_discard_max_sectors(q));
   because bio_aligned_discard_max_sectors(q) returns 0, req_sects is
   calculated as 0.
 - Now a discard bio is mistakenly initialized as a 0 byte bio by,
bio->bi_iter.bi_size = req_sects << 9;
   and sent to loop device driver.
 - This discard request is handled by loop device driver by following
   code path,
 loop_handle_cmd => do_req_filebacked => lo_fallocate =>
 file->f_op->fallocate => blkdev_fallocate => 

Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Ming Lei
On Wed, Aug 05, 2020 at 09:54:00AM +0800, Coly Li wrote:
> On 2020/8/5 07:58, Ming Lei wrote:
> > On Tue, Aug 04, 2020 at 10:23:32PM +0800, Coly Li wrote:
> >> When some buggy driver doesn't set its queue->limits.discard_granularity
> >> (e.g. current loop device driver), discard at LBA 0 on such device will
> >> trigger a kernel BUG() panic from block/blk-mq.c:563.
> >>
> >> [  955.565006][   C39] [ cut here ]
> >> [  955.559660][   C39] invalid opcode:  [#1] SMP NOPTI
> >> [  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G  
> >>   E 5.8.0-default+ #40
> >> [  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 
> >> -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
> >> [  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
> >> [  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c 
> >> e9 9f ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff 
> >> eb cb <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
> >> [  955.622179][   C39] RSP: 0018:b1288701fe28 EFLAGS: 00010202
> >> [  955.749277][   C39] RAX: 0001 RBX: 956fffba5080 RCX: 
> >> 4003
> >> [  955.749278][   C39] RDX: 0003 RSI:  RDI: 
> >> 
> >> [  955.749279][   C39] RBP:  R08:  R09: 
> >> 
> >> [  955.749279][   C39] R10: b1288701fd28 R11: 0001 R12: 
> >> a8e05160
> >> [  955.749280][   C39] R13: 0004 R14: 0004 R15: 
> >> a7ad3a1e
> >> [  955.749281][   C39] FS:  () 
> >> GS:95bfbda0() knlGS:
> >> [  955.749282][   C39] CS:  0010 DS:  ES:  CR0: 80050033
> >> [  955.749282][   C39] CR2: 7f6f0ef766a8 CR3: 005a37012002 CR4: 
> >> 007606e0
> >> [  955.749283][   C39] DR0:  DR1:  DR2: 
> >> 
> >> [  955.749284][   C39] DR3:  DR6: fffe0ff0 DR7: 
> >> 0400
> >> [  955.749284][   C39] PKRU: 5554
> >> [  955.749285][   C39] Call Trace:
> >> [  955.749290][   C39]  blk_done_softirq+0x99/0xc0
> >> [  957.550669][   C39]  __do_softirq+0xd3/0x45f
> >> [  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
> >> [  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
> >> [  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
> >> [  957.550684][   C39]  run_ksoftirqd+0x30/0x60
> >> [  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
> >> [  957.886225][   C39]  ? sort_range+0x20/0x20
> >> [  957.886226][   C39]  kthread+0x137/0x160
> >> [  957.886228][   C39]  ? kthread_park+0x90/0x90
> >> [  957.886231][   C39]  ret_from_fork+0x22/0x30
> >> [  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---
> >>
> >> This is the procedure to reproduce the panic,
> >>   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
> >>   # losetup -f /dev/nvme0n1 --direct-io=on
> >>   # blkdiscard /dev/loop0 -o 0 -l 0x200
> >>
> >> This is how the BUG() panic triggered by __blkdev_issue_discard(),
> >> - For a NVMe SSD backing loop device, the driver does not initialize
> >>   its queue->limits.discard_granularity and leaves it to 0.
> >> - When discard on LBA 0 of the loop device, __blkdev_issue_discard()
> >>   is called before loop device driver code.
> >> - Inside __blkdev_issue_discard(), when calculating value of
> >>   granularity_aligned_lba by
> >>granularity_aligned_lba = round_up(sector_mapped,
> >>q->limits.discard_granularity >> SECTOR_SHIFT);
> >>   because sector_mapped is 0 (at LBA 0 and no partition offset), and
> >>   q->limits.discard_granularity is 0 (by the buggy loop driver), the
> >>   calculated granularity_aligned_lba is 0.
> >> - The inline function bio_aligned_discard_max_sectors() is defined as
> >>return round_down(UINT_MAX, q->limits.discard_granularity) >>
> >>SECTOR_SHIFT;
> >>when q->limits.discard_granularity is 0 from loop device driver, the
> >>above calculation returns value 0.
> >> - Now granularity_aligned_lba and sctor_mapped are 0, req_sectors is
> >>   calculated by the following lines in __blkdev_issue_discard(),
> >>if (granularity_aligned_lba == sector_mapped)
> >>req_sects = min_t(sector_t, nr_sects,
> >>  bio_aligned_discard_max_sectors(q));
> >>   because bio_aligned_discard_max_sectors(q) returns 0, req_sects is
> >>   calculated as 0.
> >> - Now a discard bio is mistakenly initialized as a 0 byte bio by,
> >>bio->bi_iter.bi_size = req_sects << 9;
> >>   and sent to loop device driver.
> >> - This discard request is handled by loop device driver by following
> >>   code path,
> >> loop_handle_cmd => do_req_filebacked => lo_fallocate =>
> >> file->f_op->fallocate => blkdev_fallocate => blkdev_issue_zeroout =>
> >> 

Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Coly Li
On 2020/8/5 07:58, Ming Lei wrote:
> On Tue, Aug 04, 2020 at 10:23:32PM +0800, Coly Li wrote:
>> When some buggy driver doesn't set its queue->limits.discard_granularity
>> (e.g. current loop device driver), discard at LBA 0 on such device will
>> trigger a kernel BUG() panic from block/blk-mq.c:563.
>>
>> [  955.565006][   C39] [ cut here ]
>> [  955.559660][   C39] invalid opcode:  [#1] SMP NOPTI
>> [  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G
>> E 5.8.0-default+ #40
>> [  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 
>> -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
>> [  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
>> [  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c e9 
>> 9f ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff eb 
>> cb <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
>> [  955.622179][   C39] RSP: 0018:b1288701fe28 EFLAGS: 00010202
>> [  955.749277][   C39] RAX: 0001 RBX: 956fffba5080 RCX: 
>> 4003
>> [  955.749278][   C39] RDX: 0003 RSI:  RDI: 
>> 
>> [  955.749279][   C39] RBP:  R08:  R09: 
>> 
>> [  955.749279][   C39] R10: b1288701fd28 R11: 0001 R12: 
>> a8e05160
>> [  955.749280][   C39] R13: 0004 R14: 0004 R15: 
>> a7ad3a1e
>> [  955.749281][   C39] FS:  () GS:95bfbda0() 
>> knlGS:
>> [  955.749282][   C39] CS:  0010 DS:  ES:  CR0: 80050033
>> [  955.749282][   C39] CR2: 7f6f0ef766a8 CR3: 005a37012002 CR4: 
>> 007606e0
>> [  955.749283][   C39] DR0:  DR1:  DR2: 
>> 
>> [  955.749284][   C39] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [  955.749284][   C39] PKRU: 5554
>> [  955.749285][   C39] Call Trace:
>> [  955.749290][   C39]  blk_done_softirq+0x99/0xc0
>> [  957.550669][   C39]  __do_softirq+0xd3/0x45f
>> [  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
>> [  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
>> [  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
>> [  957.550684][   C39]  run_ksoftirqd+0x30/0x60
>> [  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
>> [  957.886225][   C39]  ? sort_range+0x20/0x20
>> [  957.886226][   C39]  kthread+0x137/0x160
>> [  957.886228][   C39]  ? kthread_park+0x90/0x90
>> [  957.886231][   C39]  ret_from_fork+0x22/0x30
>> [  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---
>>
>> This is the procedure to reproduce the panic,
>>   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
>>   # losetup -f /dev/nvme0n1 --direct-io=on
>>   # blkdiscard /dev/loop0 -o 0 -l 0x200
>>
>> This is how the BUG() panic triggered by __blkdev_issue_discard(),
>> - For a NVMe SSD backing loop device, the driver does not initialize
>>   its queue->limits.discard_granularity and leaves it to 0.
>> - When discard on LBA 0 of the loop device, __blkdev_issue_discard()
>>   is called before loop device driver code.
>> - Inside __blkdev_issue_discard(), when calculating value of
>>   granularity_aligned_lba by
>>  granularity_aligned_lba = round_up(sector_mapped,
>>  q->limits.discard_granularity >> SECTOR_SHIFT);
>>   because sector_mapped is 0 (at LBA 0 and no partition offset), and
>>   q->limits.discard_granularity is 0 (by the buggy loop driver), the
>>   calculated granularity_aligned_lba is 0.
>> - The inline function bio_aligned_discard_max_sectors() is defined as
>>  return round_down(UINT_MAX, q->limits.discard_granularity) >>
>>  SECTOR_SHIFT;
>>when q->limits.discard_granularity is 0 from loop device driver, the
>>above calculation returns value 0.
>> - Now granularity_aligned_lba and sctor_mapped are 0, req_sectors is
>>   calculated by the following lines in __blkdev_issue_discard(),
>>  if (granularity_aligned_lba == sector_mapped)
>>  req_sects = min_t(sector_t, nr_sects,
>>bio_aligned_discard_max_sectors(q));
>>   because bio_aligned_discard_max_sectors(q) returns 0, req_sects is
>>   calculated as 0.
>> - Now a discard bio is mistakenly initialized as a 0 byte bio by,
>>  bio->bi_iter.bi_size = req_sects << 9;
>>   and sent to loop device driver.
>> - This discard request is handled by loop device driver by following
>>   code path,
>> loop_handle_cmd => do_req_filebacked => lo_fallocate =>
>> file->f_op->fallocate => blkdev_fallocate => blkdev_issue_zeroout =>
>> __blkdev_issue_write_zeroes
>> - Inside __blkdev_issue_write_zeroes(), a 0 byte length discard bio is
>>   composed and sent to the backing device of the loop device.
>> - In the I/O completion code path, in my case it is,

Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Martin K. Petersen


Ming,

> What we need to fix is loop driver, if it claims to support discard,
> q->limits.discard_granularity has to be one valid value.

Yep!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Ming Lei
On Tue, Aug 04, 2020 at 10:23:32PM +0800, Coly Li wrote:
> When some buggy driver doesn't set its queue->limits.discard_granularity
> (e.g. current loop device driver), discard at LBA 0 on such device will
> trigger a kernel BUG() panic from block/blk-mq.c:563.
> 
> [  955.565006][   C39] [ cut here ]
> [  955.559660][   C39] invalid opcode:  [#1] SMP NOPTI
> [  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G 
>E 5.8.0-default+ #40
> [  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 
> -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
> [  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
> [  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c e9 
> 9f ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff eb cb 
> <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
> [  955.622179][   C39] RSP: 0018:b1288701fe28 EFLAGS: 00010202
> [  955.749277][   C39] RAX: 0001 RBX: 956fffba5080 RCX: 
> 4003
> [  955.749278][   C39] RDX: 0003 RSI:  RDI: 
> 
> [  955.749279][   C39] RBP:  R08:  R09: 
> 
> [  955.749279][   C39] R10: b1288701fd28 R11: 0001 R12: 
> a8e05160
> [  955.749280][   C39] R13: 0004 R14: 0004 R15: 
> a7ad3a1e
> [  955.749281][   C39] FS:  () GS:95bfbda0() 
> knlGS:
> [  955.749282][   C39] CS:  0010 DS:  ES:  CR0: 80050033
> [  955.749282][   C39] CR2: 7f6f0ef766a8 CR3: 005a37012002 CR4: 
> 007606e0
> [  955.749283][   C39] DR0:  DR1:  DR2: 
> 
> [  955.749284][   C39] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  955.749284][   C39] PKRU: 5554
> [  955.749285][   C39] Call Trace:
> [  955.749290][   C39]  blk_done_softirq+0x99/0xc0
> [  957.550669][   C39]  __do_softirq+0xd3/0x45f
> [  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
> [  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
> [  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
> [  957.550684][   C39]  run_ksoftirqd+0x30/0x60
> [  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
> [  957.886225][   C39]  ? sort_range+0x20/0x20
> [  957.886226][   C39]  kthread+0x137/0x160
> [  957.886228][   C39]  ? kthread_park+0x90/0x90
> [  957.886231][   C39]  ret_from_fork+0x22/0x30
> [  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---
> 
> This is the procedure to reproduce the panic,
>   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
>   # losetup -f /dev/nvme0n1 --direct-io=on
>   # blkdiscard /dev/loop0 -o 0 -l 0x200
> 
> This is how the BUG() panic triggered by __blkdev_issue_discard(),
> - For a NVMe SSD backing loop device, the driver does not initialize
>   its queue->limits.discard_granularity and leaves it to 0.
> - When discard on LBA 0 of the loop device, __blkdev_issue_discard()
>   is called before loop device driver code.
> - Inside __blkdev_issue_discard(), when calculating value of
>   granularity_aligned_lba by
>   granularity_aligned_lba = round_up(sector_mapped,
>   q->limits.discard_granularity >> SECTOR_SHIFT);
>   because sector_mapped is 0 (at LBA 0 and no partition offset), and
>   q->limits.discard_granularity is 0 (by the buggy loop driver), the
>   calculated granularity_aligned_lba is 0.
> - The inline function bio_aligned_discard_max_sectors() is defined as
>   return round_down(UINT_MAX, q->limits.discard_granularity) >>
>   SECTOR_SHIFT;
>when q->limits.discard_granularity is 0 from loop device driver, the
>above calculation returns value 0.
> - Now granularity_aligned_lba and sctor_mapped are 0, req_sectors is
>   calculated by the following lines in __blkdev_issue_discard(),
>   if (granularity_aligned_lba == sector_mapped)
>   req_sects = min_t(sector_t, nr_sects,
> bio_aligned_discard_max_sectors(q));
>   because bio_aligned_discard_max_sectors(q) returns 0, req_sects is
>   calculated as 0.
> - Now a discard bio is mistakenly initialized as a 0 byte bio by,
>   bio->bi_iter.bi_size = req_sects << 9;
>   and sent to loop device driver.
> - This discard request is handled by loop device driver by following
>   code path,
> loop_handle_cmd => do_req_filebacked => lo_fallocate =>
> file->f_op->fallocate => blkdev_fallocate => blkdev_issue_zeroout =>
> __blkdev_issue_write_zeroes
> - Inside __blkdev_issue_write_zeroes(), a 0 byte length discard bio is
>   composed and sent to the backing device of the loop device.
> - In the I/O completion code path, in my case it is,
> blk_done_softirq => nrq->q->mq_ops->complete => nvme_pci_complete_rq
> => nvme_complete_rq => 

Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Johannes Thumshirn
On 04/08/2020 16:45, Coly Li wrote:
> Yes, Ming just posts a patch with a very similar change to loop device
> driver.

Ah ok. I'll go and have a look at Ming's patch then.


Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Coly Li
On 2020/8/4 22:39, Johannes Thumshirn wrote:
> On 04/08/2020 16:37, Johannes Thumshirn wrote:
>> On 04/08/2020 16:34, Coly Li wrote:
>>> On 2020/8/4 22:31, Johannes Thumshirn wrote:
 On 04/08/2020 16:23, Coly Li wrote:
> This is the procedure to reproduce the panic,
>   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
>   # losetup -f /dev/nvme0n1 --direct-io=on
>   # blkdiscard /dev/loop0 -o 0 -l 0x200

 losetup -f /dev/sdX isn't it?

>>>
>>> In my case, I use a NVMe SSD as the backing device of the loop device.
>>> Because I don't have a scsi lun.
>>>
>>> And loading scsi_debug module seems necessary, otherwise the discard
>>> process just hang and I cannot see the kernel panic (I don't know why yet).
>>
>> OK, now that's highly interesting. Does it also happen if you back loop with
>> a file? loop_config_discard() has different cases for the different backing 
>> devices/files. S
>>
> 
> Damn I didn't want to hit sent
> 
> Does this (untested) change make a difference:
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 475e1a738560..8a07a89d702e 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -895,6 +895,9 @@ static void loop_config_discard(struct loop_device *lo)
> blk_queue_max_write_zeroes_sectors(q,
> backingq->limits.max_write_zeroes_sectors);
>  
> +   q->limits.discard_granularity =
> +   backingq->limits.discard_granularity;
> +
> /*
>  * We use punch hole to reclaim the free space used by the
>  * image a.k.a. discard. However we do not support discard if
> 

Yes, Ming just posts a patch with a very similar change to loop device
driver.

Coly Li


Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Coly Li
On 2020/8/4 22:37, Johannes Thumshirn wrote:
> On 04/08/2020 16:34, Coly Li wrote:
>> On 2020/8/4 22:31, Johannes Thumshirn wrote:
>>> On 04/08/2020 16:23, Coly Li wrote:
 This is the procedure to reproduce the panic,
   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
   # losetup -f /dev/nvme0n1 --direct-io=on
   # blkdiscard /dev/loop0 -o 0 -l 0x200
>>>
>>> losetup -f /dev/sdX isn't it?
>>>
>>
>> In my case, I use a NVMe SSD as the backing device of the loop device.
>> Because I don't have a scsi lun.
>>
>> And loading scsi_debug module seems necessary, otherwise the discard
>> process just hang and I cannot see the kernel panic (I don't know why yet).
> 
> OK, now that's highly interesting. Does it also happen if you back loop with
> a file? loop_config_discard() has different cases for the different backing 
> devices/files. S
> 
No, for a file backing, q->limits.discard_granularity is set to
inode->i_sb->s_blocksize. And the encrypted loop device does not support
discard.

Such issue just only happens on a device backing loop device which
announces supporting discard. Without Ming's fix to loop device driver,
discard on LBA 0 will trigger the BUG() panic in my setup (Maybe it is
more easier to trigger this BUG() panic with scsi lun).

Coly Li


Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Johannes Thumshirn
On 04/08/2020 16:23, Coly Li wrote:
> This is the procedure to reproduce the panic,
>   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
>   # losetup -f /dev/nvme0n1 --direct-io=on
>   # blkdiscard /dev/loop0 -o 0 -l 0x200

losetup -f /dev/sdX isn't it?


Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Johannes Thumshirn
On 04/08/2020 16:37, Johannes Thumshirn wrote:
> On 04/08/2020 16:34, Coly Li wrote:
>> On 2020/8/4 22:31, Johannes Thumshirn wrote:
>>> On 04/08/2020 16:23, Coly Li wrote:
 This is the procedure to reproduce the panic,
   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
   # losetup -f /dev/nvme0n1 --direct-io=on
   # blkdiscard /dev/loop0 -o 0 -l 0x200
>>>
>>> losetup -f /dev/sdX isn't it?
>>>
>>
>> In my case, I use a NVMe SSD as the backing device of the loop device.
>> Because I don't have a scsi lun.
>>
>> And loading scsi_debug module seems necessary, otherwise the discard
>> process just hang and I cannot see the kernel panic (I don't know why yet).
> 
> OK, now that's highly interesting. Does it also happen if you back loop with
> a file? loop_config_discard() has different cases for the different backing 
> devices/files. S
> 

Damn I didn't want to hit sent

Does this (untested) change make a difference:

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 475e1a738560..8a07a89d702e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -895,6 +895,9 @@ static void loop_config_discard(struct loop_device *lo)
blk_queue_max_write_zeroes_sectors(q,
backingq->limits.max_write_zeroes_sectors);
 
+   q->limits.discard_granularity =
+   backingq->limits.discard_granularity;
+
/*
 * We use punch hole to reclaim the free space used by the
 * image a.k.a. discard. However we do not support discard if



Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Johannes Thumshirn
On 04/08/2020 16:34, Coly Li wrote:
> On 2020/8/4 22:31, Johannes Thumshirn wrote:
>> On 04/08/2020 16:23, Coly Li wrote:
>>> This is the procedure to reproduce the panic,
>>>   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
>>>   # losetup -f /dev/nvme0n1 --direct-io=on
>>>   # blkdiscard /dev/loop0 -o 0 -l 0x200
>>
>> losetup -f /dev/sdX isn't it?
>>
> 
> In my case, I use a NVMe SSD as the backing device of the loop device.
> Because I don't have a scsi lun.
> 
> And loading scsi_debug module seems necessary, otherwise the discard
> process just hang and I cannot see the kernel panic (I don't know why yet).

OK, now that's highly interesting. Does it also happen if you back loop with
a file? loop_config_discard() has different cases for the different backing 
devices/files. S


Re: [PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Coly Li
On 2020/8/4 22:31, Johannes Thumshirn wrote:
> On 04/08/2020 16:23, Coly Li wrote:
>> This is the procedure to reproduce the panic,
>>   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
>>   # losetup -f /dev/nvme0n1 --direct-io=on
>>   # blkdiscard /dev/loop0 -o 0 -l 0x200
> 
> losetup -f /dev/sdX isn't it?
> 

In my case, I use a NVMe SSD as the backing device of the loop device.
Because I don't have a scsi lun.

And loading scsi_debug module seems necessary, otherwise the discard
process just hang and I cannot see the kernel panic (I don't know why yet).

Coly Li


[PATCH] block: tolerate 0 byte discard_granularity in __blkdev_issue_discard()

2020-08-04 Thread Coly Li
When some buggy driver doesn't set its queue->limits.discard_granularity
(e.g. current loop device driver), discard at LBA 0 on such device will
trigger a kernel BUG() panic from block/blk-mq.c:563.

[  955.565006][   C39] [ cut here ]
[  955.559660][   C39] invalid opcode:  [#1] SMP NOPTI
[  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G   
 E 5.8.0-default+ #40
[  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 
-[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
[  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
[  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c e9 9f 
ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff eb cb <0f> 
0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
[  955.622179][   C39] RSP: 0018:b1288701fe28 EFLAGS: 00010202
[  955.749277][   C39] RAX: 0001 RBX: 956fffba5080 RCX: 
4003
[  955.749278][   C39] RDX: 0003 RSI:  RDI: 

[  955.749279][   C39] RBP:  R08:  R09: 

[  955.749279][   C39] R10: b1288701fd28 R11: 0001 R12: 
a8e05160
[  955.749280][   C39] R13: 0004 R14: 0004 R15: 
a7ad3a1e
[  955.749281][   C39] FS:  () GS:95bfbda0() 
knlGS:
[  955.749282][   C39] CS:  0010 DS:  ES:  CR0: 80050033
[  955.749282][   C39] CR2: 7f6f0ef766a8 CR3: 005a37012002 CR4: 
007606e0
[  955.749283][   C39] DR0:  DR1:  DR2: 

[  955.749284][   C39] DR3:  DR6: fffe0ff0 DR7: 
0400
[  955.749284][   C39] PKRU: 5554
[  955.749285][   C39] Call Trace:
[  955.749290][   C39]  blk_done_softirq+0x99/0xc0
[  957.550669][   C39]  __do_softirq+0xd3/0x45f
[  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
[  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
[  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
[  957.550684][   C39]  run_ksoftirqd+0x30/0x60
[  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
[  957.886225][   C39]  ? sort_range+0x20/0x20
[  957.886226][   C39]  kthread+0x137/0x160
[  957.886228][   C39]  ? kthread_park+0x90/0x90
[  957.886231][   C39]  ret_from_fork+0x22/0x30
[  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---

This is the procedure to reproduce the panic,
  # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
  # losetup -f /dev/nvme0n1 --direct-io=on
  # blkdiscard /dev/loop0 -o 0 -l 0x200

This is how the BUG() panic triggered by __blkdev_issue_discard(),
- For a NVMe SSD backing loop device, the driver does not initialize
  its queue->limits.discard_granularity and leaves it to 0.
- When discard on LBA 0 of the loop device, __blkdev_issue_discard()
  is called before loop device driver code.
- Inside __blkdev_issue_discard(), when calculating value of
  granularity_aligned_lba by
granularity_aligned_lba = round_up(sector_mapped,
q->limits.discard_granularity >> SECTOR_SHIFT);
  because sector_mapped is 0 (at LBA 0 and no partition offset), and
  q->limits.discard_granularity is 0 (by the buggy loop driver), the
  calculated granularity_aligned_lba is 0.
- The inline function bio_aligned_discard_max_sectors() is defined as
return round_down(UINT_MAX, q->limits.discard_granularity) >>
SECTOR_SHIFT;
   when q->limits.discard_granularity is 0 from loop device driver, the
   above calculation returns value 0.
- Now granularity_aligned_lba and sctor_mapped are 0, req_sectors is
  calculated by the following lines in __blkdev_issue_discard(),
if (granularity_aligned_lba == sector_mapped)
req_sects = min_t(sector_t, nr_sects,
  bio_aligned_discard_max_sectors(q));
  because bio_aligned_discard_max_sectors(q) returns 0, req_sects is
  calculated as 0.
- Now a discard bio is mistakenly initialized as a 0 byte bio by,
bio->bi_iter.bi_size = req_sects << 9;
  and sent to loop device driver.
- This discard request is handled by loop device driver by following
  code path,
loop_handle_cmd => do_req_filebacked => lo_fallocate =>
file->f_op->fallocate => blkdev_fallocate => blkdev_issue_zeroout =>
__blkdev_issue_write_zeroes
- Inside __blkdev_issue_write_zeroes(), a 0 byte length discard bio is
  composed and sent to the backing device of the loop device.
- In the I/O completion code path, in my case it is,
blk_done_softirq => nrq->q->mq_ops->complete => nvme_pci_complete_rq
=> nvme_complete_rq => blk_mq_end_request
  inside blk_mq_end_request(), blk_update_request() is called and due to
  req->bio is NULL in previous step, blk_update_request() returns false
  then the BUG() panic in blk_mq_end_request() is triggered.

Although