Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
Hi Martin and Junichi, On 04/03/2017 11:10 PM, Junichi Nomura wrote: On 04/04/17 06:53, Mauricio Faria de Oliveira wrote: On 03/28/2017 11:29 PM, Junichi Nomura wrote: Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"), "rmmod lpfc" starting to cause panic or corruption due to double free. Thanks for the report. Can you please check whether the patch just sent ([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it? It works for me. Thank you! Excellent, thanks! Martin, can you review/consider it for 4.11-rc6, please? Considering future maintenance, it might be a bit fragile to just depend on the code comment about representing the relation between cq/wq and shared pring but it's maintainers' choice. I agree -- there should be a better way of identifying a bound WQ/CQ. Perhaps there is, but I couldn't find it currently. For now, as far as I could grep and examine the code (detailed in commit message), a WQ is always bound to a CQ, so to check for WQ and not free its ring pointer seems to be sufficient (as the CQ ring pointer is freed first). If that changes, probably some form of flagging and/or queue type determination would be better/necessary. cheers, -- Mauricio Faria de Oliveira IBM Linux Technology Center
Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
On 04/04/17 06:53, Mauricio Faria de Oliveira wrote: > Hi Junichi, > > On 03/28/2017 11:29 PM, Junichi Nomura wrote: >> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"), >> "rmmod lpfc" starting to cause panic or corruption due to double free. > > Thanks for the report. Can you please check whether the patch just sent > ([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it? It works for me. Thank you! Considering future maintenance, it might be a bit fragile to just depend on the code comment about representing the relation between cq/wq and shared pring but it's maintainers' choice. -- Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
Hi Junichi, On 03/28/2017 11:29 PM, Junichi Nomura wrote: Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"), "rmmod lpfc" starting to cause panic or corruption due to double free. Thanks for the report. Can you please check whether the patch just sent ([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it? I don't have a setup to test it handy right now. cheers, -- Mauricio Faria de Oliveira IBM Linux Technology Center
Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
On 03/29/17 20:17, Johannes Thumshirn wrote: > On Wed, Mar 29, 2017 at 02:29:45AM +, Junichi Nomura wrote: >> The double-free occurs as followings: >> - During initialization, lpfc_create_wq_cq() binds cq and wq to >> the same ring in the way that both cq->pring and wq->pring point >> to the same object. >> - Upon removal, lpfc_sli4_queue_destroy() ends up calling >> lpfc_sli4_queue_free() for both wqs and cqs >> and kfree(queue->pring) is done twice. >> >> The problem became more visible in v4.11-rc3 because commit 85e8a23936ab >> ("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one() >> called during driver shutdown. > > Well the obvious band-aid would be setting the pointers to NULL after freeing > them. lpfc_sli4_queue_free() checks for queue's precense and doesn't use > queue->pring prior to freeing it, so the following _should_ to the trick: > > From befa936d8935a1bed01df65b376f515fa42c99da Mon Sep 17 00:00:00 2001 > From: Johannes Thumshirn> Date: Wed, 29 Mar 2017 13:08:55 +0200 > Subject: [PATCH] lpfc: prevent double free of lpfc queue ring pointer > > Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications") > rmoving the lpfc module causes a double free in lpfc_sli4_queue_free(). > > This can be prevented by setting the queue->pring and queue pointers to NULL, > so kfree() will simply ignore the pointers on a second call. No, it doesn't work. Even if lpfc_sli4_queue_free(wq) sets wq->pring to NULL, cq->pring still holds bogus pointer and lpfc_sli4_queue_free(cq) will call kfree(cq->pring) and cause double-free. -- Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
On Wed, Mar 29, 2017 at 02:29:45AM +, Junichi Nomura wrote: > Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"), > "rmmod lpfc" starting to cause panic or corruption due to double free. > > The double-free occurs as followings: > - During initialization, lpfc_create_wq_cq() binds cq and wq to > the same ring in the way that both cq->pring and wq->pring point > to the same object. > - Upon removal, lpfc_sli4_queue_destroy() ends up calling > lpfc_sli4_queue_free() for both wqs and cqs > and kfree(queue->pring) is done twice. > > The problem became more visible in v4.11-rc3 because commit 85e8a23936ab > ("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one() > called during driver shutdown. Well the obvious band-aid would be setting the pointers to NULL after freeing them. lpfc_sli4_queue_free() checks for queue's precense and doesn't use queue->pring prior to freeing it, so the following _should_ to the trick: >From befa936d8935a1bed01df65b376f515fa42c99da Mon Sep 17 00:00:00 2001 From: Johannes ThumshirnDate: Wed, 29 Mar 2017 13:08:55 +0200 Subject: [PATCH] lpfc: prevent double free of lpfc queue ring pointer Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications") rmoving the lpfc module causes a double free in lpfc_sli4_queue_free(). This can be prevented by setting the queue->pring and queue pointers to NULL, so kfree() will simply ignore the pointers on a second call. Reported-by: Junichi Nomura Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications") Signed-off-by: Johannes Thumshirn --- drivers/scsi/lpfc/lpfc_sli.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 1c9fa45..86e1529 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -13759,7 +13759,9 @@ lpfc_sli4_queue_free(struct lpfc_queue *queue) kfree(queue->rqbp); } kfree(queue->pring); + queue->pring = NULL; kfree(queue); + queue = NULL; return; } -- 2.10.2 I'll have a look if we at the callers of lpfc_sli4_queue_free() as well and check if there's a better (a.k.a more correct) way to fix this. Byte, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
[REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"), "rmmod lpfc" starting to cause panic or corruption due to double free. The double-free occurs as followings: - During initialization, lpfc_create_wq_cq() binds cq and wq to the same ring in the way that both cq->pring and wq->pring point to the same object. - Upon removal, lpfc_sli4_queue_destroy() ends up calling lpfc_sli4_queue_free() for both wqs and cqs and kfree(queue->pring) is done twice. The problem became more visible in v4.11-rc3 because commit 85e8a23936ab ("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one() called during driver shutdown. A sample of slub_debug output is attached below. = BUG kmalloc-512 (Not tainted): Object already free - Disabling lock debugging due to kernel taint INFO: Allocated in lpfc_wq_create+0x31c/0x4f0 [lpfc] age=259902 cpu=0 pid=314 ___slab_alloc+0x47f/0x4b0 __slab_alloc+0x40/0x5c kmem_cache_alloc_trace+0x16c/0x1b0 lpfc_wq_create+0x31c/0x4f0 [lpfc] lpfc_create_wq_cq+0xb6/0x370 [lpfc] lpfc_sli4_queue_setup+0x331/0xd70 [lpfc] lpfc_sli4_hba_setup+0x12ce/0x1e90 [lpfc] lpfc_pci_probe_one_s4.isra.43+0x7c2/0x8f0 [lpfc] lpfc_pci_probe_one+0xbd/0xc30 [lpfc] local_pci_probe+0x45/0xa0 work_for_cpu_fn+0x14/0x20 process_one_work+0x165/0x410 worker_thread+0x27f/0x4c0 kthread+0x101/0x140 ret_from_fork+0x2c/0x40 INFO: Freed in lpfc_sli4_queue_free+0x11b/0x160 [lpfc] age=100 cpu=3 pid=11802 __slab_free+0x1ba/0x2c0 kfree+0x122/0x170 lpfc_sli4_queue_free+0x11b/0x160 [lpfc] lpfc_sli4_queue_destroy+0xba/0x470 [lpfc] lpfc_pci_remove_one+0x6b4/0x880 [lpfc] pci_device_remove+0x39/0xc0 device_release_driver_internal+0x141/0x1f0 driver_detach+0x3f/0x80 bus_remove_driver+0x55/0xd0 driver_unregister+0x2c/0x50 pci_unregister_driver+0x2a/0xa0 lpfc_exit+0x1c/0xe84 [lpfc] SyS_delete_module+0x1ba/0x220 do_syscall_64+0x67/0x180 return_from_SYSCALL_64+0x0/0x6a INFO: Slab 0xea0040c9ce00 objects=38 used=34 fp=0x881032739a88 flags=0x17c0008101 INFO: Object 0x881032739098 @offset=4248 fp=0x (null) Redzone 881032739090: bb bb bb bb bb bb bb bb Object 881032739098: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327390a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327390b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327390c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327390d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327390e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327390f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739108: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739118: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739128: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739138: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739148: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739158: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739168: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739178: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739188: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739198: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327391a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327391b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327391c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327391d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327391e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 8810327391f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739208: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739218: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739228: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 881032739238: 6b 6b 6b 6b 6b 6b 6b