Re: [PATCH] nvme: fix use-after-free during booting
IMHO, WARN_ON_ONCE() looks ok to me. as long as it won't crash the system or lead to any memory corruption issue. We can talk about the reproducer offline if you are interested. - Tong On Wed, Sep 23, 2020 at 1:06 AM Christoph Hellwig wrote: > > On Tue, Sep 22, 2020 at 04:34:45PM -0400, Tong Zhang wrote: > > Hi Christoph, > > I modified the patch a bit and now it works. > > So you're still hitting the WARN_ON_ONCE? I think we need to fix that > as well, but all the ideas I have will turn into a bigger project, > so I think I'll submit this one to Jens, and then do things > incrementally. > > Can you share your reproducer?
Re: [PATCH] nvme: fix use-after-free during booting
here you go [0.00] Linux version 5.9.0-rc4+ (tong@tong-desktop) (gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #44 SMP Wed Sep 23 12:15:34 EDT 2020 [0.00] Command line: console=ttyS0 root=/dev/sda earlyprintk=serial biosdevname=0 net.ifnames=0 loglevel=7 [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x7ffd6fff] usable [0.00] BIOS-e820: [mem 0x7ffd7000-0x7fff] reserved [0.00] BIOS-e820: [mem 0xb000-0xbfff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00017fff] usable [0.00] printk: bootconsole [earlyser0] enabled [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.8 present. [0.00] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 [0.00] tsc: Fast TSC calibration using PIT [0.00] tsc: Detected 3000.019 MHz processor [0.018848] last_pfn = 0x18 max_arch_pfn = 0x4 [0.020004] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.020455] last_pfn = 0x7ffd7 max_arch_pfn = 0x4 [0.033611] found SMP MP-table at [mem 0x000f5a90-0x000f5a9f] [0.035100] check: Scanning 1 areas for low memory corruption [0.046212] ACPI: Early table checksum verification disabled [0.046681] ACPI: RSDP 0x000F5850 14 (v00 BOCHS ) [0.047095] ACPI: RSDT 0x7FFE20C2 38 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.047953] ACPI: FACP 0x7FFE1EB2 F4 (v03 BOCHS BXPCFACP 0001 BXPC 0001) [0.048792] ACPI: DSDT 0x7FFE0040 001E72 (v01 BOCHS BXPCDSDT 0001 BXPC 0001) [0.049074] ACPI: FACS 0x7FFE 40 [0.049256] ACPI: APIC 0x7FFE1FA6 80 (v01 BOCHS BXPCAPIC 0001 BXPC 0001) [0.049501] ACPI: HPET 0x7FFE2026 38 (v01 BOCHS BXPCHPET 0001 BXPC 0001) [0.049724] ACPI: MCFG 0x7FFE205E 3C (v01 BOCHS BXPCMCFG 0001 BXPC 0001) [0.049892] ACPI: WAET 0x7FFE209A 28 (v01 BOCHS BXPCWAET 0001 BXPC 0001) [0.058109] No NUMA configuration found [0.058193] Faking a node at [mem 0x-0x00017fff] [0.059024] NODE_DATA(0) allocated [mem 0x17fffa000-0x17fffdfff] [0.061178] Zone ranges: [0.061271] DMA [mem 0x1000-0x00ff] [0.061393] DMA32[mem 0x0100-0x] [0.061481] Normal [mem 0x0001-0x00017fff] [0.061571] Movable zone start for each node [0.061654] Early memory node ranges [0.061735] node 0: [mem 0x1000-0x0009efff] [0.061939] node 0: [mem 0x0010-0x7ffd6fff] [0.062056] node 0: [mem 0x0001-0x00017fff] [0.062963] Zeroed struct page in unavailable ranges: 139 pages [0.063146] Initmem setup node 0 [mem 0x1000-0x00017fff] [0.701550] kasan: KernelAddressSanitizer initialized [0.702094] ACPI: PM-Timer IO Port: 0x608 [0.702932] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) [0.703636] IOAPIC[0]: apic_id 0, version 32, address 0xfec0, GSI 0-23 [0.704018] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.704487] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) [0.704656] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.704900] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) [0.705041] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) [0.705464] Using ACPI (MADT) for SMP configuration information [0.705676] ACPI: HPET id: 0x8086a201 base: 0xfed0 [0.706244] smpboot: Allowing 2 CPUs, 0 hotplug CPUs [0.707373] PM: hibernation: Registered nosave memory: [mem 0x-0x0fff] [0.707547] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009] [0.707729] PM: hibernation: Registered nosave memory: [mem 0x000a-0x000e] [0.707837] PM: hibernation: Registered nosave memory: [mem 0x000f-0x000f] [0.708039] PM: hibernation: Registered nosave memory: [mem 0x7ffd7000-0x7fff] [0.708162] PM: hibernation: Registered nosave memory: [mem 0x8000-0xafff] [0.708345] PM: hibernation: Registered nosave memory: [mem 0xb000-0xbfff] [0.708439] PM: hibernation: Registered nosave memory: [mem 0xc00
Re: [PATCH] nvme: fix use-after-free during booting
I suspect the patch below might be better. Can you send me a full dmesg with this one applied? Preferably on top of Jens' for-next branch? diff --git a/block/genhd.c b/block/genhd.c index 9d060e79eb31d8..ef2784c69d59ee 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -832,7 +832,9 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk, * Take an extra ref on queue which will be put on disk_release() * so that it sticks around as long as @disk is there. */ - WARN_ON_ONCE(!blk_get_queue(disk->queue)); + WARN_ON_ONCE(blk_queue_dying(disk->queue)); + __blk_get_queue(disk->queue); + disk->flags |= GENHD_FL_QUEUE_REF; disk_add_events(disk); blk_integrity_add(disk); @@ -1564,7 +1566,7 @@ static void disk_release(struct device *dev) kfree(disk->random); disk_replace_part_tbl(disk, NULL); hd_free_part(&disk->part0); - if (disk->queue) + if (disk->flags & GENHD_FL_QUEUE_REF) blk_put_queue(disk->queue); kfree(disk); } diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 1c97cf84f011a7..822a619924e3b5 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -133,6 +133,7 @@ struct hd_struct { #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100 #define GENHD_FL_NO_PART_SCAN 0x0200 #define GENHD_FL_HIDDEN0x0400 +#define GENHD_FL_QUEUE_REF 0x0800 enum { DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */
Re: [PATCH] nvme: fix use-after-free during booting
On Tue, Sep 22, 2020 at 04:34:45PM -0400, Tong Zhang wrote: > Hi Christoph, > I modified the patch a bit and now it works. So you're still hitting the WARN_ON_ONCE? I think we need to fix that as well, but all the ideas I have will turn into a bigger project, so I think I'll submit this one to Jens, and then do things incrementally. Can you share your reproducer?
Re: [PATCH] nvme: fix use-after-free during booting
Hi Christoph, I modified the patch a bit and now it works. Best, - Tong --- block/genhd.c | 7 +-- include/linux/genhd.h | 1 + 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 99c64641c314..8c432e5f97ea 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -835,7 +835,10 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk, * Take an extra ref on queue which will be put on disk_release() * so that it sticks around as long as @disk is there. */ - WARN_ON_ONCE(!blk_get_queue(disk->queue)); + if (blk_get_queue(disk->queue)) + disk->flags |= GENHD_FL_QUEUE_REF; + else + WARN_ON_ONCE(1); disk_add_events(disk); blk_integrity_add(disk); @@ -1567,7 +1570,7 @@ static void disk_release(struct device *dev) kfree(disk->random); disk_replace_part_tbl(disk, NULL); hd_free_part(&disk->part0); - if (disk->queue) + if (disk->flags & GENHD_FL_QUEUE_REF) blk_put_queue(disk->queue); kfree(disk); } diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 4ab853461dff..9441077ee103 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -135,6 +135,7 @@ struct hd_struct { #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100 #define GENHD_FL_NO_PART_SCAN 0x0200 #define GENHD_FL_HIDDEN0x0400 +#define GENHD_FL_QUEUE_REF 0x0800 enum { DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */ -- 2.25.1 > On Sep 22, 2020, at 12:41 PM, Christoph Hellwig wrote: > > On Tue, Sep 22, 2020 at 12:19:06PM -0400, Tong Zhang wrote: >> Thank you Christoph for providing the patch. >> However, the test shows that the issue still persists even with the new >> patch. >> The call trace is the same as in my first email. > > Weird. What baseline did you test? You need to use the patch on top of > the latest for-next or for-5.10/block tree from Jens for full effect. > > If that still feels can you send me a longer dmesg that also shows what > happens before the sniplet you sent?
Re: [PATCH] nvme: fix use-after-free during booting
Thank you Christoph. I will do some testing with my setup and let you know. - Tong On Tue, Sep 22, 2020 at 9:59 AM Christoph Hellwig wrote: > > Hi Tong, > > can you test this patch? > > diff --git a/block/genhd.c b/block/genhd.c > index 99c64641c3148c..6473ae703789e4 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -836,6 +836,7 @@ static void __device_add_disk(struct device *parent, > struct gendisk *disk, > * so that it sticks around as long as @disk is there. > */ > WARN_ON_ONCE(!blk_get_queue(disk->queue)); > + disk->flags |= GENHD_FL_QUEUE_REF; > > disk_add_events(disk); > blk_integrity_add(disk); > @@ -1567,7 +1568,7 @@ static void disk_release(struct device *dev) > kfree(disk->random); > disk_replace_part_tbl(disk, NULL); > hd_free_part(&disk->part0); > - if (disk->queue) > + if (disk->flags & GENHD_FL_QUEUE_REF) > blk_put_queue(disk->queue); > kfree(disk); > } > diff --git a/include/linux/genhd.h b/include/linux/genhd.h > index 4ab853461dff25..9441077ee10329 100644 > --- a/include/linux/genhd.h > +++ b/include/linux/genhd.h > @@ -135,6 +135,7 @@ struct hd_struct { > #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100 > #define GENHD_FL_NO_PART_SCAN 0x0200 > #define GENHD_FL_HIDDEN0x0400 > +#define GENHD_FL_QUEUE_REF 0x0800 > > enum { > DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */
Re: [PATCH] nvme: fix use-after-free during booting
Hi Tong, can you test this patch? diff --git a/block/genhd.c b/block/genhd.c index 99c64641c3148c..6473ae703789e4 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -836,6 +836,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk, * so that it sticks around as long as @disk is there. */ WARN_ON_ONCE(!blk_get_queue(disk->queue)); + disk->flags |= GENHD_FL_QUEUE_REF; disk_add_events(disk); blk_integrity_add(disk); @@ -1567,7 +1568,7 @@ static void disk_release(struct device *dev) kfree(disk->random); disk_replace_part_tbl(disk, NULL); hd_free_part(&disk->part0); - if (disk->queue) + if (disk->flags & GENHD_FL_QUEUE_REF) blk_put_queue(disk->queue); kfree(disk); } diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 4ab853461dff25..9441077ee10329 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -135,6 +135,7 @@ struct hd_struct { #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100 #define GENHD_FL_NO_PART_SCAN 0x0200 #define GENHD_FL_HIDDEN0x0400 +#define GENHD_FL_QUEUE_REF 0x0800 enum { DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */
Re: [PATCH] nvme: fix use-after-free during booting
On Thu, Sep 17, 2020 at 11:24:59AM -0400, Tong Zhang wrote: > Hmm.. OK. Any suggestions on how to fix this? Yes, we'll need to make sure the block layer doesn't double put for a not fully setup disk/queue. It has been on my todo list for a while, I'll plan to get back to in the next days.
Re: [PATCH] nvme: fix use-after-free during booting
Hmm.. OK. Any suggestions on how to fix this? On Thu, Sep 17, 2020 at 4:26 AM Christoph Hellwig wrote: > > On Wed, Sep 16, 2020 at 11:36:05AM -0400, Tong Zhang wrote: > > if a nvme controller is not responding during probing, a use-after-free > > condition could happen > > This now leaks the queue for the regular teardown path.
Re: [PATCH] nvme: fix use-after-free during booting
On Wed, Sep 16, 2020 at 11:36:05AM -0400, Tong Zhang wrote: > if a nvme controller is not responding during probing, a use-after-free > condition could happen This now leaks the queue for the regular teardown path.
[PATCH] nvme: fix use-after-free during booting
if a nvme controller is not responding during probing, a use-after-free condition could happen [ 215.396884] nvme nvme0: Identify Controller failed (-4) [ 215.397239] nvme nvme0: Removing after probe failure status: -5 [ 215.409079] Buffer I/O error on dev nvme0n1, logical block 0, async page read [ 215.409526] Buffer I/O error on dev nvme0n1, logical block 1, async page read [ 215.409924] Buffer I/O error on dev nvme0n1, logical block 2, async page read [ 215.410317] Buffer I/O error on dev nvme0n1, logical block 3, async page read [ 215.410706] Buffer I/O error on dev nvme0n1, logical block 4, async page read [ 215.411103] Buffer I/O error on dev nvme0n1, logical block 5, async page read [ 215.411498] Buffer I/O error on dev nvme0n1, logical block 6, async page read [ 215.411893] Buffer I/O error on dev nvme0n1, logical block 7, async page read [ 215.412272] nvme0n1: unable to read partition table [ 215.412581] nvme0n1: partition table beyond EOD, truncated [ 215.966867] [ cut here ] [ 215.967167] WARNING: CPU: 1 PID: 7 at block/genhd.c:838 __device_add_disk+0x79c/0x7c0 [ 215.967558] Modules linked in: [ 215.967717] CPU: 1 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #59 [ 215.968618] Workqueue: nvme-wq nvme_scan_work [ 215.968849] RIP: 0010:__device_add_disk+0x79c/0x7c0 [ 215.969098] Code: 85 30 04 00 00 48 89 44 24 20 e9 41 fa ff ff 48 89 df e8 07 e5 ca ff 80 a5 dc 00 00 00 ef e9 88 fc ff ff 0f 0b e9 81 fc ff ff <0f> 0b e9 a1 fc ff ff 0f 0b e9 b5 fa ff ff 0f 0b e9 51 ff ff ff e8 [ 215.970032] RSP: :88806c06f9f0 EFLAGS: 00010246 [ 215.970297] RAX: RBX: 88806c233400 RCX: 9ce3e74e [ 215.970653] RDX: dc00 RSI: 0008 RDI: 888066135560 [ 215.971015] RBP: 88806b2ce800 R08: 0001 R09: ed100cc26aad [ 215.971373] R10: 888066135567 R11: ed100cc26aac R12: 88806c40f940 [ 215.971731] R13: 88806b2ce8a0 R14: 88806b2ce808 R15: 88806b2cebd8 [ 215.972094] FS: () GS:88806d10() knlGS: [ 215.972497] CS: 0010 DS: ES: CR0: 80050033 [ 215.972789] CR2: CR3: 24e0e000 CR4: 06e0 [ 215.973153] DR0: DR1: DR2: [ 215.973513] DR3: DR6: fffe0ff0 DR7: 0400 [ 215.973875] Call Trace: [ 215.974007] ? blk_alloc_devt+0x140/0x140 [ 215.974213] ? __hrtimer_init+0xb6/0xf0 [ 215.974411] ? rwsem_down_read_slowpath+0x7d0/0x7d0 [ 215.974659] ? __nvme_revalidate_disk+0x1ba/0x420 [ 215.974905] nvme_validate_ns+0x54c/0xe70 [ 215.975113] ? nvme_dev_ioctl+0x190/0x190 [ 215.975318] ? __blk_mq_free_request+0xe3/0x130 [ 215.975549] ? __nvme_submit_sync_cmd+0x153/0x300 [ 215.975789] ? kasan_unpoison_shadow+0x33/0x40 [ 215.976022] nvme_scan_work+0x20f/0x35f [ 215.976220] ? nvme_fw_act_work+0x210/0x210 [ 215.976434] ? free_object+0x50/0x50 [ 215.976619] ? try_to_wake_up+0x37c/0x900 [ 215.976619] ? try_to_wake_up+0x37c/0x900 [ 215.976843] ? read_word_at_a_time+0xe/0x20 [ 215.977057] ? strscpy+0xbf/0x190 [ 215.977229] process_one_work+0x4ad/0x7e0 [ 215.977435] worker_thread+0x73/0x690 [ 215.977623] ? process_one_work+0x7e0/0x7e0 [ 215.977842] kthread+0x199/0x1f0 [ 215.978011] ? kthread_create_on_node+0xd0/0xd0 [ 215.978240] ret_from_fork+0x22/0x30 [ 215.978423] ---[ end trace e8b38966135fe74b ]--- [ 215.987504] refcount_t: underflow; use-after-free. [ 215.987772] WARNING: CPU: 1 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0xc5/0x110 [ 215.988641] Modules linked in: [ 215.988827] CPU: 1 PID: 7 Comm: kworker/u4:0 Tainted: GW 5.8.0+ #59 [ 215.989784] Workqueue: nvme-wq nvme_scan_work [ 215.990013] RIP: 0010:refcount_warn_saturate+0xc5/0x110 0b e9 76 ff ff ff 80 3d cc 0f f6 01 00 0f 85 69 ff ff ff 48 c7 [ 215.991220] RSP: :88806c06fb60 EFLAGS: 00010282 [ 215.991485] RAX: RBX: 0003 RCX: [ 215.991850] RDX: 0001 RSI: 0004 RDI: ed100d80df5e [ 215.992209] RBP: 8880661355b0 R08: 0001 R09: ed100d80df32 [ 215.992568] R10: 0003 R11: ed100d80df31 R12: 88806b2ce830 [ 215.992937] R13: 88806b2ce800 R14: 88806bf47a78 R15: 88806bdb6a88 [ 215.993294] FS: () GS:88806d10() knlGS: [ 215.993700] CS: 0010 DS: ES: CR0: 80050033 [ 215.993997] CR2: CR3: 24e0e000 CR4: 06e0 [ 215.994356] DR0: DR1: DR2: [ 215.994714] DR3: DR6: fffe0ff0 DR7: 0400 [ 215.995077] Call Trace: [ 215.995208] disk_release+0x114/0x130 [ 215.995397] device_release+0x3c/0xd0 [ 215.995586] kobject_put+0xa5/0x120 [ 215.995767] nvme_put_ns+0x40/0xa0 [ 2