Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown

2017-04-04 Thread Mauricio Faria de Oliveira

Hi Martin and Junichi,

On 04/03/2017 11:10 PM, Junichi Nomura wrote:

On 04/04/17 06:53, Mauricio Faria de Oliveira wrote:



On 03/28/2017 11:29 PM, Junichi Nomura wrote:

Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
"rmmod lpfc" starting to cause panic or corruption due to double free.



Thanks for the report. Can you please check whether the patch just sent
([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?



It works for me. Thank you!


Excellent, thanks!

Martin, can you review/consider it for 4.11-rc6, please?


Considering future maintenance, it might be a bit fragile to just depend
on the code comment about representing the relation between cq/wq and
shared pring but it's maintainers' choice.


I agree -- there should be a better way of identifying a bound WQ/CQ.
Perhaps there is, but I couldn't find it currently.

For now, as far as I could grep and examine the code (detailed in commit
message), a WQ is always bound to a CQ, so to check for WQ and not free
its ring pointer seems to be sufficient (as the CQ ring pointer is freed
first).

If that changes, probably some form of flagging and/or queue type
determination would be better/necessary.

cheers,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown

2017-04-03 Thread Junichi Nomura
On 04/04/17 06:53, Mauricio Faria de Oliveira wrote:
> Hi Junichi,
> 
> On 03/28/2017 11:29 PM, Junichi Nomura wrote:
>> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
>> "rmmod lpfc" starting to cause panic or corruption due to double free.
> 
> Thanks for the report. Can you please check whether the patch just sent
> ([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?

It works for me. Thank you!

Considering future maintenance, it might be a bit fragile to just depend
on the code comment about representing the relation between cq/wq and
shared pring but it's maintainers' choice.

-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.


Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown

2017-04-03 Thread Mauricio Faria de Oliveira

Hi Junichi,

On 03/28/2017 11:29 PM, Junichi Nomura wrote:

Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
"rmmod lpfc" starting to cause panic or corruption due to double free.


Thanks for the report. Can you please check whether the patch just sent
([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?

I don't have a setup to test it handy right now.

cheers,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown

2017-03-29 Thread Junichi Nomura
On 03/29/17 20:17, Johannes Thumshirn wrote:
> On Wed, Mar 29, 2017 at 02:29:45AM +, Junichi Nomura wrote:
>> The double-free occurs as followings:
>>   - During initialization, lpfc_create_wq_cq() binds cq and wq to
>> the same ring in the way that both cq->pring and wq->pring point
>> to the same object.
>>   - Upon removal, lpfc_sli4_queue_destroy() ends up calling
>> lpfc_sli4_queue_free() for both wqs and cqs
>> and kfree(queue->pring) is done twice.
>>
>> The problem became more visible in v4.11-rc3 because commit 85e8a23936ab
>> ("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one()
>> called during driver shutdown.
> 
> Well the obvious band-aid would be setting the pointers to NULL after freeing
> them. lpfc_sli4_queue_free() checks for queue's precense and doesn't use
> queue->pring prior to freeing it, so the following _should_ to the trick:
> 
> From befa936d8935a1bed01df65b376f515fa42c99da Mon Sep 17 00:00:00 2001
> From: Johannes Thumshirn 
> Date: Wed, 29 Mar 2017 13:08:55 +0200
> Subject: [PATCH] lpfc: prevent double free of lpfc queue ring pointer
> 
> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
> rmoving the lpfc module causes a double free in lpfc_sli4_queue_free().
> 
> This can be prevented by setting the queue->pring and queue pointers to NULL,
> so kfree() will simply ignore the pointers on a second call.

No, it doesn't work.

Even if lpfc_sli4_queue_free(wq) sets wq->pring to NULL, cq->pring still
holds bogus pointer and lpfc_sli4_queue_free(cq) will call kfree(cq->pring)
and cause double-free.

-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.



Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown

2017-03-29 Thread Johannes Thumshirn
On Wed, Mar 29, 2017 at 02:29:45AM +, Junichi Nomura wrote:
> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
> "rmmod lpfc" starting to cause panic or corruption due to double free.
> 
> The double-free occurs as followings:
>   - During initialization, lpfc_create_wq_cq() binds cq and wq to
> the same ring in the way that both cq->pring and wq->pring point
> to the same object.
>   - Upon removal, lpfc_sli4_queue_destroy() ends up calling
> lpfc_sli4_queue_free() for both wqs and cqs
> and kfree(queue->pring) is done twice.
> 
> The problem became more visible in v4.11-rc3 because commit 85e8a23936ab
> ("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one()
> called during driver shutdown.

Well the obvious band-aid would be setting the pointers to NULL after freeing
them. lpfc_sli4_queue_free() checks for queue's precense and doesn't use
queue->pring prior to freeing it, so the following _should_ to the trick:

>From befa936d8935a1bed01df65b376f515fa42c99da Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn 
Date: Wed, 29 Mar 2017 13:08:55 +0200
Subject: [PATCH] lpfc: prevent double free of lpfc queue ring pointer

Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
rmoving the lpfc module causes a double free in lpfc_sli4_queue_free().

This can be prevented by setting the queue->pring and queue pointers to NULL,
so kfree() will simply ignore the pointers on a second call.

Reported-by: Junichi Nomura 
Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
Signed-off-by: Johannes Thumshirn 
---
 drivers/scsi/lpfc/lpfc_sli.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1c9fa45..86e1529 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -13759,7 +13759,9 @@ lpfc_sli4_queue_free(struct lpfc_queue *queue)
kfree(queue->rqbp);
}
kfree(queue->pring);
+   queue->pring = NULL;
kfree(queue);
+   queue = NULL;
return;
 }
 
-- 
2.10.2

I'll have a look if we at the callers of lpfc_sli4_queue_free() as well
and check if there's a better (a.k.a more correct) way to fix this.

Byte,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown

2017-03-28 Thread Junichi Nomura
Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
"rmmod lpfc" starting to cause panic or corruption due to double free.

The double-free occurs as followings:
  - During initialization, lpfc_create_wq_cq() binds cq and wq to
the same ring in the way that both cq->pring and wq->pring point
to the same object.
  - Upon removal, lpfc_sli4_queue_destroy() ends up calling
lpfc_sli4_queue_free() for both wqs and cqs
and kfree(queue->pring) is done twice.

The problem became more visible in v4.11-rc3 because commit 85e8a23936ab
("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one()
called during driver shutdown.

A sample of slub_debug output is attached below.

=
BUG kmalloc-512 (Not tainted): Object already free
-

Disabling lock debugging due to kernel taint
INFO: Allocated in lpfc_wq_create+0x31c/0x4f0 [lpfc] age=259902 cpu=0 pid=314
___slab_alloc+0x47f/0x4b0
__slab_alloc+0x40/0x5c
kmem_cache_alloc_trace+0x16c/0x1b0
lpfc_wq_create+0x31c/0x4f0 [lpfc]
lpfc_create_wq_cq+0xb6/0x370 [lpfc]
lpfc_sli4_queue_setup+0x331/0xd70 [lpfc]
lpfc_sli4_hba_setup+0x12ce/0x1e90 [lpfc]
lpfc_pci_probe_one_s4.isra.43+0x7c2/0x8f0 [lpfc]
lpfc_pci_probe_one+0xbd/0xc30 [lpfc]
local_pci_probe+0x45/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x165/0x410
worker_thread+0x27f/0x4c0
kthread+0x101/0x140
ret_from_fork+0x2c/0x40
INFO: Freed in lpfc_sli4_queue_free+0x11b/0x160 [lpfc] age=100 cpu=3 pid=11802
__slab_free+0x1ba/0x2c0
kfree+0x122/0x170
lpfc_sli4_queue_free+0x11b/0x160 [lpfc]
lpfc_sli4_queue_destroy+0xba/0x470 [lpfc]
lpfc_pci_remove_one+0x6b4/0x880 [lpfc]
pci_device_remove+0x39/0xc0
device_release_driver_internal+0x141/0x1f0
driver_detach+0x3f/0x80
bus_remove_driver+0x55/0xd0
driver_unregister+0x2c/0x50
pci_unregister_driver+0x2a/0xa0
lpfc_exit+0x1c/0xe84 [lpfc]
SyS_delete_module+0x1ba/0x220
do_syscall_64+0x67/0x180
return_from_SYSCALL_64+0x0/0x6a
INFO: Slab 0xea0040c9ce00 objects=38 used=34 fp=0x881032739a88 
flags=0x17c0008101
INFO: Object 0x881032739098 @offset=4248 fp=0x  (null)

Redzone 881032739090: bb bb bb bb bb bb bb bb  

Object 881032739098: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327390a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327390b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327390c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327390d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327390e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327390f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739108: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739118: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739128: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739138: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739148: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739158: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739168: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739178: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739188: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739198: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327391a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327391b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327391c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327391d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327391e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 8810327391f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739208: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739218: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739228: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  

Object 881032739238: 6b 6b 6b 6b 6b 6b 6b