Re: [PATCH] nvme: fix use-after-free during booting

2020-09-23 Thread Tong Zhang
IMHO, WARN_ON_ONCE() looks ok to me. as long as it won't crash the
system or lead to any memory corruption issue.
We can talk about the reproducer offline if you are interested.
- Tong

On Wed, Sep 23, 2020 at 1:06 AM Christoph Hellwig  wrote:
>
> On Tue, Sep 22, 2020 at 04:34:45PM -0400, Tong Zhang wrote:
> > Hi Christoph,
> > I modified the patch a bit and now it works.
>
> So you're still hitting the WARN_ON_ONCE?  I think we need to fix that
> as well, but all the ideas I have will turn into a bigger project,
> so I think I'll submit this one to Jens, and then do things
> incrementally.
>
> Can you share your reproducer?


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-23 Thread Tong Zhang
here you go

[0.00] Linux version 5.9.0-rc4+ (tong@tong-desktop) (gcc
(Ubuntu 9.3.0-10ubuntu2) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34)
#44 SMP Wed Sep 23 12:15:34 EDT 2020
[0.00] Command line: console=ttyS0 root=/dev/sda
earlyprintk=serial biosdevname=0 net.ifnames=0 loglevel=7
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x7ffd6fff] usable
[0.00] BIOS-e820: [mem 0x7ffd7000-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xb000-0xbfff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00017fff] usable
[0.00] printk: bootconsole [earlyser0] enabled
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.8 present.
[0.00] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 3000.019 MHz processor
[0.018848] last_pfn = 0x18 max_arch_pfn = 0x4
[0.020004] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[0.020455] last_pfn = 0x7ffd7 max_arch_pfn = 0x4
[0.033611] found SMP MP-table at [mem 0x000f5a90-0x000f5a9f]
[0.035100] check: Scanning 1 areas for low memory corruption
[0.046212] ACPI: Early table checksum verification disabled
[0.046681] ACPI: RSDP 0x000F5850 14 (v00 BOCHS )
[0.047095] ACPI: RSDT 0x7FFE20C2 38 (v01 BOCHS
BXPCRSDT 0001 BXPC 0001)
[0.047953] ACPI: FACP 0x7FFE1EB2 F4 (v03 BOCHS
BXPCFACP 0001 BXPC 0001)
[0.048792] ACPI: DSDT 0x7FFE0040 001E72 (v01 BOCHS
BXPCDSDT 0001 BXPC 0001)
[0.049074] ACPI: FACS 0x7FFE 40
[0.049256] ACPI: APIC 0x7FFE1FA6 80 (v01 BOCHS
BXPCAPIC 0001 BXPC 0001)
[0.049501] ACPI: HPET 0x7FFE2026 38 (v01 BOCHS
BXPCHPET 0001 BXPC 0001)
[0.049724] ACPI: MCFG 0x7FFE205E 3C (v01 BOCHS
BXPCMCFG 0001 BXPC 0001)
[0.049892] ACPI: WAET 0x7FFE209A 28 (v01 BOCHS
BXPCWAET 0001 BXPC 0001)
[0.058109] No NUMA configuration found
[0.058193] Faking a node at [mem 0x-0x00017fff]
[0.059024] NODE_DATA(0) allocated [mem 0x17fffa000-0x17fffdfff]
[0.061178] Zone ranges:
[0.061271]   DMA  [mem 0x1000-0x00ff]
[0.061393]   DMA32[mem 0x0100-0x]
[0.061481]   Normal   [mem 0x0001-0x00017fff]
[0.061571] Movable zone start for each node
[0.061654] Early memory node ranges
[0.061735]   node   0: [mem 0x1000-0x0009efff]
[0.061939]   node   0: [mem 0x0010-0x7ffd6fff]
[0.062056]   node   0: [mem 0x0001-0x00017fff]
[0.062963] Zeroed struct page in unavailable ranges: 139 pages
[0.063146] Initmem setup node 0 [mem 0x1000-0x00017fff]
[0.701550] kasan: KernelAddressSanitizer initialized
[0.702094] ACPI: PM-Timer IO Port: 0x608
[0.702932] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[0.703636] IOAPIC[0]: apic_id 0, version 32, address 0xfec0, GSI 0-23
[0.704018] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.704487] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[0.704656] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.704900] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[0.705041] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[0.705464] Using ACPI (MADT) for SMP configuration information
[0.705676] ACPI: HPET id: 0x8086a201 base: 0xfed0
[0.706244] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[0.707373] PM: hibernation: Registered nosave memory: [mem
0x-0x0fff]
[0.707547] PM: hibernation: Registered nosave memory: [mem
0x0009f000-0x0009]
[0.707729] PM: hibernation: Registered nosave memory: [mem
0x000a-0x000e]
[0.707837] PM: hibernation: Registered nosave memory: [mem
0x000f-0x000f]
[0.708039] PM: hibernation: Registered nosave memory: [mem
0x7ffd7000-0x7fff]
[0.708162] PM: hibernation: Registered nosave memory: [mem
0x8000-0xafff]
[0.708345] PM: hibernation: Registered nosave memory: [mem
0xb000-0xbfff]
[0.708439] PM: hibernation: Registered nosave memory: [mem
0xc00

Re: [PATCH] nvme: fix use-after-free during booting

2020-09-22 Thread Christoph Hellwig
I suspect the patch below might be better.  Can you send me a full dmesg
with this one applied?  Preferably on top of Jens' for-next branch?


diff --git a/block/genhd.c b/block/genhd.c
index 9d060e79eb31d8..ef2784c69d59ee 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -832,7 +832,9 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * Take an extra ref on queue which will be put on disk_release()
 * so that it sticks around as long as @disk is there.
 */
-   WARN_ON_ONCE(!blk_get_queue(disk->queue));
+   WARN_ON_ONCE(blk_queue_dying(disk->queue));
+   __blk_get_queue(disk->queue);
+   disk->flags |= GENHD_FL_QUEUE_REF;
 
disk_add_events(disk);
blk_integrity_add(disk);
@@ -1564,7 +1566,7 @@ static void disk_release(struct device *dev)
kfree(disk->random);
disk_replace_part_tbl(disk, NULL);
hd_free_part(&disk->part0);
-   if (disk->queue)
+   if (disk->flags & GENHD_FL_QUEUE_REF)
blk_put_queue(disk->queue);
kfree(disk);
 }
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 1c97cf84f011a7..822a619924e3b5 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -133,6 +133,7 @@ struct hd_struct {
 #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100
 #define GENHD_FL_NO_PART_SCAN  0x0200
 #define GENHD_FL_HIDDEN0x0400
+#define GENHD_FL_QUEUE_REF 0x0800
 
 enum {
DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */



Re: [PATCH] nvme: fix use-after-free during booting

2020-09-22 Thread Christoph Hellwig
On Tue, Sep 22, 2020 at 04:34:45PM -0400, Tong Zhang wrote:
> Hi Christoph,
> I modified the patch a bit and now it works.

So you're still hitting the WARN_ON_ONCE?  I think we need to fix that
as well, but all the ideas I have will turn into a bigger project,
so I think I'll submit this one to Jens, and then do things
incrementally.

Can you share your reproducer?


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-22 Thread Tong Zhang
Hi Christoph,
I modified the patch a bit and now it works.
Best,
- Tong

---
 block/genhd.c | 7 +--
 include/linux/genhd.h | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 99c64641c314..8c432e5f97ea 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -835,7 +835,10 @@ static void __device_add_disk(struct device *parent, 
struct gendisk *disk,
 * Take an extra ref on queue which will be put on disk_release()
 * so that it sticks around as long as @disk is there.
 */
-   WARN_ON_ONCE(!blk_get_queue(disk->queue));
+   if (blk_get_queue(disk->queue))
+   disk->flags |= GENHD_FL_QUEUE_REF;
+   else
+   WARN_ON_ONCE(1);
 
disk_add_events(disk);
blk_integrity_add(disk);
@@ -1567,7 +1570,7 @@ static void disk_release(struct device *dev)
kfree(disk->random);
disk_replace_part_tbl(disk, NULL);
hd_free_part(&disk->part0);
-   if (disk->queue)
+   if (disk->flags & GENHD_FL_QUEUE_REF)
blk_put_queue(disk->queue);
kfree(disk);
 }
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 4ab853461dff..9441077ee103 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -135,6 +135,7 @@ struct hd_struct {
 #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100
 #define GENHD_FL_NO_PART_SCAN  0x0200
 #define GENHD_FL_HIDDEN0x0400
+#define GENHD_FL_QUEUE_REF 0x0800
 
 enum {
DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */
-- 
2.25.1


> On Sep 22, 2020, at 12:41 PM, Christoph Hellwig  wrote:
> 
> On Tue, Sep 22, 2020 at 12:19:06PM -0400, Tong Zhang wrote:
>> Thank you Christoph for providing the patch.
>> However, the test shows that the issue still persists even with the new 
>> patch.
>> The call trace is the same as in my first email.
> 
> Weird.  What baseline did you test?  You need to use the patch on top of
> the latest for-next or for-5.10/block tree from Jens for full effect.
> 
> If that still feels can you send me a longer dmesg that also shows what
> happens before the sniplet you sent?



Re: [PATCH] nvme: fix use-after-free during booting

2020-09-22 Thread Tong Zhang
Thank you Christoph.
I will do some testing with my setup and let you know.
- Tong

On Tue, Sep 22, 2020 at 9:59 AM Christoph Hellwig  wrote:
>
> Hi Tong,
>
> can you test this patch?
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 99c64641c3148c..6473ae703789e4 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -836,6 +836,7 @@ static void __device_add_disk(struct device *parent, 
> struct gendisk *disk,
>  * so that it sticks around as long as @disk is there.
>  */
> WARN_ON_ONCE(!blk_get_queue(disk->queue));
> +   disk->flags |= GENHD_FL_QUEUE_REF;
>
> disk_add_events(disk);
> blk_integrity_add(disk);
> @@ -1567,7 +1568,7 @@ static void disk_release(struct device *dev)
> kfree(disk->random);
> disk_replace_part_tbl(disk, NULL);
> hd_free_part(&disk->part0);
> -   if (disk->queue)
> +   if (disk->flags & GENHD_FL_QUEUE_REF)
> blk_put_queue(disk->queue);
> kfree(disk);
>  }
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 4ab853461dff25..9441077ee10329 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -135,6 +135,7 @@ struct hd_struct {
>  #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100
>  #define GENHD_FL_NO_PART_SCAN  0x0200
>  #define GENHD_FL_HIDDEN0x0400
> +#define GENHD_FL_QUEUE_REF 0x0800
>
>  enum {
> DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-22 Thread Christoph Hellwig
Hi Tong,

can you test this patch?

diff --git a/block/genhd.c b/block/genhd.c
index 99c64641c3148c..6473ae703789e4 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -836,6 +836,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * so that it sticks around as long as @disk is there.
 */
WARN_ON_ONCE(!blk_get_queue(disk->queue));
+   disk->flags |= GENHD_FL_QUEUE_REF;
 
disk_add_events(disk);
blk_integrity_add(disk);
@@ -1567,7 +1568,7 @@ static void disk_release(struct device *dev)
kfree(disk->random);
disk_replace_part_tbl(disk, NULL);
hd_free_part(&disk->part0);
-   if (disk->queue)
+   if (disk->flags & GENHD_FL_QUEUE_REF)
blk_put_queue(disk->queue);
kfree(disk);
 }
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 4ab853461dff25..9441077ee10329 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -135,6 +135,7 @@ struct hd_struct {
 #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100
 #define GENHD_FL_NO_PART_SCAN  0x0200
 #define GENHD_FL_HIDDEN0x0400
+#define GENHD_FL_QUEUE_REF 0x0800
 
 enum {
DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-17 Thread Christoph Hellwig
On Thu, Sep 17, 2020 at 11:24:59AM -0400, Tong Zhang wrote:
> Hmm.. OK. Any suggestions on how to fix this?

Yes, we'll need to make sure the block layer doesn't double put
for a not fully setup disk/queue.  It has been on my todo list for
a while, I'll plan to get back to in the next days.


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-17 Thread Tong Zhang
Hmm.. OK. Any suggestions on how to fix this?

On Thu, Sep 17, 2020 at 4:26 AM Christoph Hellwig  wrote:
>
> On Wed, Sep 16, 2020 at 11:36:05AM -0400, Tong Zhang wrote:
> > if a nvme controller is not responding during probing, a use-after-free
> > condition could happen
>
> This now leaks the queue for the regular teardown path.


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-17 Thread Christoph Hellwig
On Wed, Sep 16, 2020 at 11:36:05AM -0400, Tong Zhang wrote:
> if a nvme controller is not responding during probing, a use-after-free
> condition could happen

This now leaks the queue for the regular teardown path.


[PATCH] nvme: fix use-after-free during booting

2020-09-16 Thread Tong Zhang
if a nvme controller is not responding during probing, a use-after-free
condition could happen

[  215.396884] nvme nvme0: Identify Controller failed (-4)
[  215.397239] nvme nvme0: Removing after probe failure status: -5
[  215.409079] Buffer I/O error on dev nvme0n1, logical block 0, async page read
[  215.409526] Buffer I/O error on dev nvme0n1, logical block 1, async page read
[  215.409924] Buffer I/O error on dev nvme0n1, logical block 2, async page read
[  215.410317] Buffer I/O error on dev nvme0n1, logical block 3, async page read
[  215.410706] Buffer I/O error on dev nvme0n1, logical block 4, async page read
[  215.411103] Buffer I/O error on dev nvme0n1, logical block 5, async page read
[  215.411498] Buffer I/O error on dev nvme0n1, logical block 6, async page read
[  215.411893] Buffer I/O error on dev nvme0n1, logical block 7, async page read
[  215.412272]  nvme0n1: unable to read partition table
[  215.412581] nvme0n1: partition table beyond EOD, truncated
[  215.966867] [ cut here ]
[  215.967167] WARNING: CPU: 1 PID: 7 at block/genhd.c:838 
__device_add_disk+0x79c/0x7c0
[  215.967558] Modules linked in:
[  215.967717] CPU: 1 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #59
[  215.968618] Workqueue: nvme-wq nvme_scan_work
[  215.968849] RIP: 0010:__device_add_disk+0x79c/0x7c0
[  215.969098] Code: 85 30 04 00 00 48 89 44 24 20 e9 41 fa ff ff 48 89 df e8 
07 e5 ca ff 80 a5 dc 00 00 00 ef e9 88 fc ff ff 0f 0b e9 81 fc ff ff <0f>
0b e9 a1 fc ff ff 0f 0b e9 b5 fa ff ff 0f 0b e9 51 ff ff ff e8
[  215.970032] RSP: :88806c06f9f0 EFLAGS: 00010246
[  215.970297] RAX:  RBX: 88806c233400 RCX: 9ce3e74e
[  215.970653] RDX: dc00 RSI: 0008 RDI: 888066135560
[  215.971015] RBP: 88806b2ce800 R08: 0001 R09: ed100cc26aad
[  215.971373] R10: 888066135567 R11: ed100cc26aac R12: 88806c40f940
[  215.971731] R13: 88806b2ce8a0 R14: 88806b2ce808 R15: 88806b2cebd8
[  215.972094] FS:  () GS:88806d10() 
knlGS:
[  215.972497] CS:  0010 DS:  ES:  CR0: 80050033
[  215.972789] CR2:  CR3: 24e0e000 CR4: 06e0
[  215.973153] DR0:  DR1:  DR2: 
[  215.973513] DR3:  DR6: fffe0ff0 DR7: 0400
[  215.973875] Call Trace:
[  215.974007]  ? blk_alloc_devt+0x140/0x140
[  215.974213]  ? __hrtimer_init+0xb6/0xf0
[  215.974411]  ? rwsem_down_read_slowpath+0x7d0/0x7d0
[  215.974659]  ? __nvme_revalidate_disk+0x1ba/0x420
[  215.974905]  nvme_validate_ns+0x54c/0xe70
[  215.975113]  ? nvme_dev_ioctl+0x190/0x190
[  215.975318]  ? __blk_mq_free_request+0xe3/0x130
[  215.975549]  ? __nvme_submit_sync_cmd+0x153/0x300
[  215.975789]  ? kasan_unpoison_shadow+0x33/0x40
[  215.976022]  nvme_scan_work+0x20f/0x35f
[  215.976220]  ? nvme_fw_act_work+0x210/0x210
[  215.976434]  ? free_object+0x50/0x50
[  215.976619]  ? try_to_wake_up+0x37c/0x900
[  215.976619]  ? try_to_wake_up+0x37c/0x900
[  215.976843]  ? read_word_at_a_time+0xe/0x20
[  215.977057]  ? strscpy+0xbf/0x190
[  215.977229]  process_one_work+0x4ad/0x7e0
[  215.977435]  worker_thread+0x73/0x690
[  215.977623]  ? process_one_work+0x7e0/0x7e0
[  215.977842]  kthread+0x199/0x1f0
[  215.978011]  ? kthread_create_on_node+0xd0/0xd0
[  215.978240]  ret_from_fork+0x22/0x30
[  215.978423] ---[ end trace e8b38966135fe74b ]---
[  215.987504] refcount_t: underflow; use-after-free.
[  215.987772] WARNING: CPU: 1 PID: 7 at lib/refcount.c:28 
refcount_warn_saturate+0xc5/0x110
[  215.988641] Modules linked in:
[  215.988827] CPU: 1 PID: 7 Comm: kworker/u4:0 Tainted: GW 
5.8.0+ #59
[  215.989784] Workqueue: nvme-wq nvme_scan_work
[  215.990013] RIP: 0010:refcount_warn_saturate+0xc5/0x110
0b e9 76 ff ff ff 80 3d cc 0f f6 01 00 0f 85 69 ff ff ff 48 c7
[  215.991220] RSP: :88806c06fb60 EFLAGS: 00010282
[  215.991485] RAX:  RBX: 0003 RCX: 
[  215.991850] RDX: 0001 RSI: 0004 RDI: ed100d80df5e
[  215.992209] RBP: 8880661355b0 R08: 0001 R09: ed100d80df32
[  215.992568] R10: 0003 R11: ed100d80df31 R12: 88806b2ce830
[  215.992937] R13: 88806b2ce800 R14: 88806bf47a78 R15: 88806bdb6a88
[  215.993294] FS:  () GS:88806d10() 
knlGS:
[  215.993700] CS:  0010 DS:  ES:  CR0: 80050033
[  215.993997] CR2:  CR3: 24e0e000 CR4: 06e0
[  215.994356] DR0:  DR1:  DR2: 
[  215.994714] DR3:  DR6: fffe0ff0 DR7: 0400
[  215.995077] Call Trace:
[  215.995208]  disk_release+0x114/0x130
[  215.995397]  device_release+0x3c/0xd0
[  215.995586]  kobject_put+0xa5/0x120
[  215.995767]  nvme_put_ns+0x40/0xa0
[  2