commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
binds the CQs and WQs ring pointer (sets it to same address on both).

    lpfc_create_wq_cq():
    ...
            rc = lpfc_cq_create(phba, cq, eq, <...>)
    ...
                    rc = lpfc_wq_create(phba, wq, cq, qtype);
    ...
                    /* Bind this CQ/WQ to the NVME ring */
                    pring = wq->pring;
    ...
                    cq->pring = pring;
    ...

The commit frees both CQ & WQ for FCP/NVME on lpfc_sli4_queue_destroy(),
which causes a double free (potential corruption or panic) on freeing
the ring pointer of the second entity (CQ is first, WQ is second):

    lpfc_pci_remove_one() # that is, .remove / .shutdown
    -> lpfc_pci_remove_one_s4()
       -> lpfc_sli4_hba_unset()
          -> lpfc_sli4_queue_destroy()

             -> lpfc_sli4_release_queues()  # Release FCP/NVME cqs
                -> __lpfc_sli4_release_queue()
                   -> lpfc_sli4_queue_free()
                      -> kfree(queue->pring)  # first free

             -> lpfc_sli4_release_queues()  # Release FCP/NVME wqs
                -> __lpfc_sli4_release_queue()
                   -> lpfc_sli4_queue_free()
                      -> kfree(queue->pring)  # second free

So, check for WQs in lpfc_sli4_queue_free() and do not free the pring,
as it is freed before in the bound CQ.  [the WQs are created only via
lpfc_wq_create(), which sets struct lpfc_queue::type == LPFC_WQ.  And
that happens in 2 sites (lpfc_create_wq_cq() & lpfc_fof_queue_setup()),
both of which bind the CQs & WQs. Thus, checking for the LPFC_WQ type
correlates to whether the WQ is bound to a CQ, which is freed first.]

Additional details:

For reference, that binding also occurs on one other function:

    lpfc_fof_queue_setup():
    ...
                    rc = lpfc_cq_create(phba, phba->sli4_hba.oas_cq, <...>)
    ...
                    rc = lpfc_wq_create(phba, phba->sli4_hba.oas_wq, <...>)
    ...
                    /* Bind this CQ/WQ to the NVME ring */
                    pring = phba->sli4_hba.oas_wq->pring;
    ...
                    phba->sli4_hba.oas_cq->pring = pring;

And used to occur similarly on lpfc_sli4_queue_setup(), but was changed
by that commit; although the problem is more related to the new freeing
pattern introduced in lpfc_sli4_queue_destroy() plus the bound CQs/WQs.

    -               /* Bind this WQ to the next FCP ring */
    -               pring = &psli->ring[MAX_SLI3_CONFIGURED_RINGS + fcp_wqidx];
    ...
    -               phba->sli4_hba.fcp_cq[fcp_wqidx]->pring = pring;

commit 85e8a23936ab ("scsi: lpfc: Add shutdown method for kexec") made
this more likely as lpfc_pci_remove_one() is called on driver shutdown
(e.g., modprobe -r / rmmod).

(this patch is partially based on a different patch suggested by Johannes,
 thus adding a Suggested-by tag for due credit.)

Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Reported-by: Junichi Nomura <j-nom...@ce.jp.nec.com>
Suggested-by: Johannes Thumshirn <jthumsh...@suse.de>
---
 drivers/scsi/lpfc/lpfc_sli.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1c9fa45df7eb..8befe841adaa 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -13758,7 +13758,14 @@ void lpfc_sli4_els_xri_abort_event_proc(struct 
lpfc_hba *phba)
                lpfc_free_rq_buffer(queue->phba, queue);
                kfree(queue->rqbp);
        }
-       kfree(queue->pring);
+
+       /*
+        * The WQs/CQs' pring is bound (same pointer).
+        * So free it only once, and not again on WQ.
+        */
+       if (queue->type != LPFC_WQ)
+               kfree(queue->pring);
+
        kfree(queue);
        return;
 }
-- 
1.8.3.1

Reply via email to