Hi Mario,

I posted a fix for this: https://lore.kernel.org/dri-devel/[email protected]/

I am not sure if it still a good time to merge to drm-misc-next-fixes for 6.20 kernel. And I plan to merge to drm-misc-fixes during 6.20 rc1 time.

Thanks,

Lizhi

On 2/10/26 08:42, Mario Limonciello wrote:
aie2_destroy_context() is called during various cleanup paths, including
when context creation fails partially. If xdna_mailbox_create_channel()
fails during aie2_create_context(), the hwctx->priv->mbox_chann pointer
remains NULL. When cleanup occurs (e.g., during process termination via
amdxdna_hwctx_remove_all), aie2_destroy_context() is invoked and attempts
to stop and destroy the NULL mailbox channel, leading to a NULL pointer
dereference.

The issue was observed in the following call path:
   amdxdna_drm_close
     amdxdna_hwctx_remove_all
       aie2_hwctx_fini
         aie2_release_resource
           aie2_destroy_context
             xdna_mailbox_stop_channel <- NULL dereference

Add NULL checks in aie2_destroy_context() before calling mailbox channel
operations. Also add defensive NULL checks in aie2_hw_stop() for both
mgmt_chann and mbox to prevent similar issues during device shutdown.

Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in 
context cleanup")
Signed-off-by: Mario Limonciello <[email protected]>
---
  drivers/accel/amdxdna/aie2_message.c | 14 +++++++++-----
  drivers/accel/amdxdna/aie2_pci.c     | 14 +++++++++-----
  2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/accel/amdxdna/aie2_message.c 
b/drivers/accel/amdxdna/aie2_message.c
index 7d7dcfeaf7942..77e3cdf18658b 100644
--- a/drivers/accel/amdxdna/aie2_message.c
+++ b/drivers/accel/amdxdna/aie2_message.c
@@ -318,11 +318,15 @@ int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, 
struct amdxdna_hwctx *hwc
        struct amdxdna_dev *xdna = ndev->xdna;
        int ret;
- xdna_mailbox_stop_channel(hwctx->priv->mbox_chann);
-       ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id);
-       xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann);
-       XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id);
-       hwctx->priv->mbox_chann = NULL;
+       if (hwctx->priv->mbox_chann) {
+               xdna_mailbox_stop_channel(hwctx->priv->mbox_chann);
+               ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id);
+               xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann);
+               XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id);
+               hwctx->priv->mbox_chann = NULL;
+       } else {
+               ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id);
+       }
        hwctx->fw_ctx_id = -1;
        ndev->hwctx_num--;
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index f70ccf0f3c019..9c2572706bf53 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -324,11 +324,15 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna)
        }
aie2_mgmt_fw_fini(ndev);
-       xdna_mailbox_stop_channel(ndev->mgmt_chann);
-       xdna_mailbox_destroy_channel(ndev->mgmt_chann);
-       ndev->mgmt_chann = NULL;
-       drmm_kfree(&xdna->ddev, ndev->mbox);
-       ndev->mbox = NULL;
+       if (ndev->mgmt_chann) {
+               xdna_mailbox_stop_channel(ndev->mgmt_chann);
+               xdna_mailbox_destroy_channel(ndev->mgmt_chann);
+               ndev->mgmt_chann = NULL;
+       }
+       if (ndev->mbox) {
+               drmm_kfree(&xdna->ddev, ndev->mbox);
+               ndev->mbox = NULL;
+       }
        aie2_psp_stop(ndev->psp_hdl);
        aie2_smu_fini(ndev);
        aie2_error_async_events_free(ndev);

Reply via email to