[PATCH] drm/amd: use drm_kms_helper_poll_fini in amdgpu_device_suspend to avoid warning

2023-02-16 Thread Bert Karwatzki
When drm_kms_helper_poll_disable is used in amdgpu_device_suspend
without drm_kms_helper_poll_init having been called it causes a warning
in __flush_work:
https://gitlab.freedesktop.org/drm/amd/-/issues/2411
To avoid this one can use drm_kms_helper_poll_fini instead:


>From 51cba3ae1e9f557cca8e37eb43b9b9310d0d504d Mon Sep 17 00:00:00 2001
From: Bert Karwatzki 
Date: Thu, 16 Feb 2023 10:34:11 +0100
Subject: [PATCH] Use drm_kms_helper_poll_fini instead of
 drm_kms_helper_poll_disable in amdgpu_device.c to avoid a warning from
 __flush_work.

Signed-off-by: Bert Karwatzki 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/drm_probe_helper.c | 7 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b325f7039e0e..dc9e9868a84b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4145,7 +4145,7 @@ int amdgpu_device_suspend(struct drm_device *dev,
bool fbcon)
if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3))
DRM_WARN("smart shift update failed\n");
 
-   drm_kms_helper_poll_disable(dev);
+   drm_kms_helper_poll_fini(dev);
 
if (fbcon)
drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)-
>fb_helper, true);
diff --git a/drivers/gpu/drm/drm_probe_helper.c
b/drivers/gpu/drm/drm_probe_helper.c
index 8127be134c39..105d00d5ebf3 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -842,9 +842,10 @@ EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
  *
  * This function disables the output polling work.
  *
- * Drivers can call this helper from their device suspend
implementation. It is
- * not an error to call this even when output polling isn't enabled or
already
- * disabled. Polling is re-enabled by calling
drm_kms_helper_poll_enable().
+ * Drivers can call this helper from their device suspend
implementation. If it
+ * is not known if drm_kms_helper_poll_init has been called before the
driver
+ * should use drm_kms_helper_poll_fini_instead.
+ * Polling is re-enabled by calling drm_kms_helper_poll_enable().
  *
  * Note that calls to enable and disable polling must be strictly
ordered, which
  * is automatically the case when they're only call from
suspend/resume
-- 
2.39.1



Bert Karwatzki



Re: [PATCH v2 3/9] drm/amdgpu: add new IOCTL for usermode queue

2023-02-16 Thread Shashank Sharma

Hey Christian,

On 16/02/2023 08:23, Christian König wrote:

Am 15.02.23 um 19:43 schrieb Shashank Sharma:

From: Shashank Sharma 

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
   - Alex: Keep a list of queues, instead of single queue per process.
   - Christian: Use the queue manager instead of global ptrs,
    Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
  - Christian:
    - Formatting of text
    - There is no need for queuing of userqueues, with idr in place
  - Alex:
    - Remove use_doorbell, its unnecessary
    - Reuse amdgpu_mqd_props for saving mqd fields

  - Code formatting and re-arrangement

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 114 ++
  .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
  3 files changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

index 2d6bcfd727c8..229976a2d0e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2749,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] 
= {
  DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),

  };
    static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c

index 13e1eebc1cb6..ecf31d86f3de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -22,6 +22,120 @@
   */
    #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static inline int
+amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct 
amdgpu_usermode_queue *queue)

+{
+    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, 
GFP_KERNEL);

+}
+
+static inline void
+amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int 
queue_id)

+{
+    idr_remove(&uq_mgr->userq_idr, queue_id);
+}
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+    return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)

+{
+    struct amdgpu_usermode_queue *queue;
+    struct amdgpu_fpriv *fpriv = filp->driver_priv;
+    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
+    int r;
+
+    /* Do we have support userqueues for this IP ? */
+    if (!uq_mgr->userq_funcs[mqd_in->ip_type]) {
+    DRM_ERROR("GFX User queues not supported for this IP: %d\n", 
mqd_in->ip_type);

+    return -EINVAL;
+    }
+
+    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+    if (!queue) {
+    DRM_ERROR("Failed to allocate memory for queue\n");
+    return -ENOMEM;
+    }
+
+    mutex_lock(&uq_mgr->userq_mutex);
+    queue->userq_prop.wptr_gpu_addr = mqd_in->wptr_va;
+    queue->userq_prop.rptr_gpu_addr = mqd_in->rptr_va;
+    queue->userq_prop.queue_size = mqd_in->queue_size;
+    queue->userq_prop.hqd_base_gpu_addr = mqd_in->queue_va;
+    queue->userq_prop.queue_size = mqd_in->queue_size;
+
+    queue->doorbell_handle = mqd_in->doorbell_handle;
+    queue->queue_type = mqd_in->ip_type;
+    queue->flags = mqd_in->flags;
+    queue->vm = &fpriv->vm;
+    queue->shadow_ctx_gpu_addr = mqd_in->shadow_va;



+    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
+    if (queue->queue_id < 0) {
+    DRM_ERROR("Failed to allocate a queue id\n");
+    r = queue->queue_id;
+    goto free_queue;
+    }


Don't keep the assigned id inside the queue structure. This is only 
used as handle between userspace and kernel and not useful inside the 
kernel otherwise.


This prevents people from using it in hw communication.


Noted,

- Shashank



Apart from that this looks good to me,
Christian.


+
+    args->out.queue_id = queue->queue_id;
+    args->out.flags = 0;
+    mutex_unlock(&uq_mgr->userq_mutex);
+    return 0;
+
+free_queue:
+    mutex_unlock(&uq_mgr->userq_mutex);
+    kfree(queue);
+    return r;
+}
+
+static void amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)

+{
+    struct amdgpu_fpriv *fpriv = filp->driver_priv;
+    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+    struct amdgpu_usermode_queue *queue;
+
+    queu

Re: [PATCH v2 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue

2023-02-16 Thread Shashank Sharma



On 16/02/2023 08:31, Christian König wrote:

Am 15.02.23 um 19:43 schrieb Shashank Sharma:

From: Shashank Sharma 

A Memory queue descriptor (MQD) of a userqueue defines it in the 
harware's
context. As MQD format can vary between different graphics IPs, we 
need gfx

GEN specific handlers to create MQDs.

This patch:
- Introduces MQD hander functions for the usermode queues.
- Adds new functions to create and destroy MQD for GFX-GEN-11-IP

V1: Worked on review comments from Alex:
 - Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
 - Reuse the existing adev->mqd[ip] for MQD creation
 - Formatting and arrangement of code

Cc: Alex Deucher 
Cc: Christian Koenig 

Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 21 +
  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 84 +++
  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
  4 files changed, 113 insertions(+)
  create mode 100644 
drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c


diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 764801cc8203..0c825bdf12d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
    # add usermode queue
  amdgpu-y += amdgpu_userqueue.o
+amdgpu-y += amdgpu_userqueue_gfx_v11.o
    ifneq ($(CONFIG_HSA_AMD),)
  AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c

index ecf31d86f3de..963e38f654a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -82,6 +82,12 @@ static int amdgpu_userqueue_create(struct drm_file 
*filp, union drm_amdgpu_userq

  goto free_queue;
  }
  +    r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, 
queue);

+    if (r) {
+    DRM_ERROR("Failed to create/map userqueue MQD\n");
+    goto free_queue;
+    }
+
  args->out.queue_id = queue->queue_id;
  args->out.flags = 0;
  mutex_unlock(&uq_mgr->userq_mutex);
@@ -106,6 +112,7 @@ static void amdgpu_userqueue_destroy(struct 
drm_file *filp, int queue_id)

  }
    mutex_lock(&uq_mgr->userq_mutex);
+ uq_mgr->userq_funcs[queue->queue_type]->mqd_destroy(uq_mgr, queue);
  amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
  mutex_unlock(&uq_mgr->userq_mutex);
  kfree(queue);
@@ -136,6 +143,19 @@ int amdgpu_userq_ioctl(struct drm_device *dev, 
void *data,

  return r;
  }
  +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
+
+static void
+amdgpu_userqueue_setup_ip_funcs(struct amdgpu_userq_mgr *uq_mgr)
+{
+    int maj;
+    struct amdgpu_device *adev = uq_mgr->adev;
+    uint32_t version = adev->ip_versions[GC_HWIP][0];
+
+    maj = IP_VERSION_MAJ(version);
+    if (maj == 11)
+    uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
+}
    int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
struct amdgpu_device *adev)

  {
@@ -143,6 +163,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi

  idr_init_base(&userq_mgr->userq_idr, 1);
  userq_mgr->adev = adev;
  +    amdgpu_userqueue_setup_ip_funcs(userq_mgr);
  return 0;
  }
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c

new file mode 100644
index ..12e1a785b65a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
the

+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_userqueue.h"
+
+static int
+amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_us

RE: [PATCH] drm/amdgpu: don't increase UMC RAS UE count if no new bad page

2023-02-16 Thread Yang, Stanley
[AMD Official Use Only - General]



> -Original Message-
> From: Zhou1, Tao 
> Sent: Thursday, February 16, 2023 3:58 PM
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Yang, Stanley ; Chai,
> Thomas ; Li, Candice ; Lazar,
> Lijo 
> Cc: Zhou1, Tao 
> Subject: [PATCH] drm/amdgpu: don't increase UMC RAS UE count if no new
> bad page
> 
> If a UMC bad page is reserved but not freed by an application, the application
> may trigger uncorrectable error repeatly by accessing the page.
> 
> v2: add specific function to do the check.
> 
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 24
> 
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |  2 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c |  4 
>  3 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 6e543558386d..5214034e1b16 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2115,6 +2115,30 @@ int amdgpu_ras_save_bad_pages(struct
> amdgpu_device *adev)
>   return 0;
>  }
> 
> +/* Return false if all pages have been reserved before, no new bad page
> + * is found, otherwise return true.
> + * Note: the function should be called between
> amdgpu_ras_add_bad_pages
> + * and amdgpu_ras_save_bad_pages.
> + */
> +bool amdgpu_ras_new_bad_page_is_added(struct amdgpu_device *adev)
> {
> + struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> + struct ras_err_handler_data *data;
> + struct amdgpu_ras_eeprom_control *control;
> + int save_count;
> +
> + if (!con || !con->eh_data)
> + return false;
> +
> + mutex_lock(&con->recovery_lock);
> + control = &con->eeprom_control;
> + data = con->eh_data;
> + save_count = data->count - control->ras_num_recs;
> + mutex_unlock(&con->recovery_lock);
> +
> + return (save_count ? true : false);
> +}
> +
>  /*
>   * read error record array in eeprom and reserve enough space for
>   * storing new bad pages
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> index f2ad93f6..606b75c36848 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> @@ -549,6 +549,8 @@ int amdgpu_ras_add_bad_pages(struct
> amdgpu_device *adev,
> 
>  int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev);
> 
> +bool amdgpu_ras_new_bad_page_is_added(struct amdgpu_device *adev);
> +
>  static inline enum ta_ras_block
>  amdgpu_ras_block_to_ta(enum amdgpu_ras_block block) {
>   switch (block) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index 1c7fcb4f2380..1146e65c22be 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -147,6 +147,10 @@ static int amdgpu_umc_do_page_retirement(struct
> amdgpu_device *adev,
>   err_data->err_addr_cnt) {
>   amdgpu_ras_add_bad_pages(adev, err_data-
> >err_addr,
>   err_data->err_addr_cnt);
> + /* if no new bad page is found, no need to increase
> ue count */
> + if (!amdgpu_ras_new_bad_page_is_added(adev))
> + err_data->ue_count = 0;

[Stanley]: There is a scenario, a UMC bad page is reserved but not freed by an 
application, the application accesses the above reserved page and it also
accesses a new bad page, driver read 2 ue count but save one new bad page, the 
err_data->ue_count should be set to 1.

> +
>   amdgpu_ras_save_bad_pages(adev);
> 
>   amdgpu_dpm_send_hbm_bad_pages_num(adev,
> con->eeprom_control.ras_num_recs);
> --
> 2.35.1


Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-02-16 Thread Daniel Vetter
On Tue, Jan 17, 2023 at 04:06:05AM +0300, Dmitry Osipenko wrote:
> 16.01.2023 18:11, Christian König пишет:
> > 
> >
> >> mmapping the memory with that new offset should still work. The
> >> imported BO is created with ttm_bo_type_sg, and AFAICT ttm_bo_vm.c
> >> supports mapping of SG BOs.
> >
> > Actually it shouldn't. This can go boom really easily.
> 
>  OK. I don't think we're doing this, but after Xiaogang raised the
>  question I went looking through the code whether it's theoretically
>  possible. I didn't find anything in the code that says that mmapping
>  imported dmabufs would be prohibited or even dangerous. On the
>  contrary, I found that ttm_bo_vm explicitly supports mmapping SG BOs.
> 
> 
> >
> > When you have imported a BO the only correct way of to mmap() it is
> > to do so on the original exporter.
> 
>  That seems sensible, and this is what we do today. That said, if
>  mmapping an imported BO is dangerous, I'm missing a mechanism to
>  protect against this. It could be as simple as setting
>  AMDGPU_GEM_CREATE_NO_CPU_ACCESS in amdgpu_dma_buf_create_obj.
> >>>
> >>> At least for the GEM mmap() handler this is double checked very early
> >>> by looking at obj->import_attach and then either rejecting it or
> >>> redirecting the request to the DMA-buf file instead.
> >>
> >> Can you point me at where this check is? I see a check for
> >> obj->import_attach in drm_gem_dumb_map_offset. But I can't see how
> >> this function is called in amdgpu. I don't think it is used at all.
> > 
> > Uff, good question! @Thomas and @Dmitry: I clearly remember that one of
> > you guys was involved in the DRM/GEM mmap cleanup and DMA-buf with
> > workarounds for the KFD and DMA-buf.
> > 
> > What was the final solution to this? I can't find it of hand any more.
> 
> I was looking at it. The AMDGPU indeed allows to map imported GEMs, but
> then touching the mapped area by CPU results in a bus fault. You,
> Christian, suggested that this an AMDGPU bug that should be fixed by
> prohibiting the mapping in the first place and I was going to fix it,
> but then the plan changed from prohibiting the mapping into fixing it.
> 
> The first proposal was to make DRM core to handle the dma-buf mappings
> for all drivers universally [1]. Then we decided that will be better to
> prohibit mapping of imported GEMs [2]. In the end, Rob Clark argued that
> better to implement the [1], otherwise current userspace (Android) will
> be broken if mapping will be prohibited.
> 
> The last question was about the cache syncing of imported dma-bufs, how
> to ensure that drivers will do the cache maintenance/syncing properly.
> Rob suggested that it should be a problem for drivers and not for DRM core.
> 
> I was going to re-send the [1], but other things were getting priority.
> It's good that you reminded me about it :) I may re-send it sometime
> soon if there are no new objections.
> 
> [1] https://patchwork.freedesktop.org/patch/487481/
> 
> [2]
> https://lore.kernel.org/all/20220701090240.1896131-1-dmitry.osipe...@collabora.com/

Hm I still don't like allowing this in general, because in general it just
doesn't work.

I think more like a per-driver opt-in or something might be needed, so
that drivers which "know" that it's ok to just mmap without coherency can
allow that. Allowing this in general essentially gives up on the entire
idea of dma-buf cache flushing completely.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 7/8] drm/amdgpu: create doorbell kernel object

2023-02-16 Thread Shashank Sharma



On 14/02/2023 20:26, Shashank Sharma wrote:


On 14/02/2023 19:35, Christian König wrote:

Am 14.02.23 um 17:15 schrieb Shashank Sharma:

From: Shashank Sharma 

This patch does the following:
- Initializes TTM range management for domain DOORBELL.
- Introduces a kernel bo for doorbell management in form of 
mman.doorbell_kernel_bo.

   This bo holds the kernel doorbell space now.
- Removes ioremapping of doorbell-kernel memory, as its not required 
now.


V2:
- Addressed review comments from Christian:
 - do not use kernel_create_at(0), use kernel_create() instead.
 - do not use ttm_resource_manager, use range_manager instead.
 - do not ioremap doorbell, TTM will do that.
- Split one big patch into 2

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  7 +++
  2 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index e9dc24191fc8..086e83c17c0f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1879,12 +1879,32 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
  return r;
  }
  +    r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_DOORBELL, 
adev->doorbell.doorbell_aper_size);

+    if (r) {
+    DRM_ERROR("Failed initializing oa heap.\n");
+    return r;
+    }
+
  if (amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
  AMDGPU_GEM_DOMAIN_GTT,
  &adev->mman.sdma_access_bo, NULL,
  &adev->mman.sdma_access_ptr))
  DRM_WARN("Debug VRAM access will use slowpath MM access\n");
  +    /* Create a doorbell BO for kernel usages */
+    r = amdgpu_bo_create_kernel(adev,
+    adev->mman.doorbell_kernel_bo_size,
+    PAGE_SIZE,
+    AMDGPU_GEM_DOMAIN_DOORBELL,
+    &adev->mman.doorbell_kernel_bo,
+    &adev->mman.doorbell_gpu_addr,
+    (void **)&adev->mman.doorbell_cpu_addr);
+
+    if (r) {
+    DRM_ERROR("Failed to create doorbell BO, err=%d\n", r);
+    return r;
+    }
+


I would even move this before the SDMA VRAM buffer since the later is 
only nice to have while the doorbell is mandatory to have.

Agree,



  return 0;
  }
  @@ -1908,6 +1928,8 @@ void amdgpu_ttm_fini(struct amdgpu_device 
*adev)

    NULL, NULL);
  amdgpu_bo_free_kernel(&adev->mman.sdma_access_bo, NULL,
  &adev->mman.sdma_access_ptr);
+ amdgpu_bo_free_kernel(&adev->mman.doorbell_kernel_bo,
+  NULL, (void **)&adev->mman.doorbell_cpu_addr);
  amdgpu_ttm_fw_reserve_vram_fini(adev);
  amdgpu_ttm_drv_reserve_vram_fini(adev);
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h

index 9cf5d8419965..50748ff1dd3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -97,6 +97,13 @@ struct amdgpu_mman {
  /* PAGE_SIZE'd BO for process memory r/w over SDMA. */
  struct amdgpu_bo    *sdma_access_bo;
  void    *sdma_access_ptr;
+
+    /* doorbells reserved for the kernel driver */
+    u32    num_kernel_doorbells;    /* Number of doorbells 
actually reserved for kernel */

+    uint64_t    doorbell_kernel_bo_size;


That looks like duplicated information. We should only keep either 
the number of kernel doorbells or the kernel doorbell bo size around, 
not both.

Yeah, agree. I can remove one of these two.


On a second thought, while doing some experiments with doorbells I 
realized that we might want to keep both of these, as:


num_kernel_doorbell = actual doorbells reserved for kernel,

doorbell_kernel_bo_size = max (PAGE_SIZE,  num_kernel_doorbell* sizeof(u32))

I will have to update the code to reflect that as well.


- Shashank



And BTW please no comment after structure members.


Noted,

- Shashank


Christian.


+    uint64_t    doorbell_gpu_addr;
+    struct amdgpu_bo    *doorbell_kernel_bo;
+    u32    *doorbell_cpu_addr;
  };
    struct amdgpu_copy_mem {




[PATCHv2] drm/amdgpu: register a vga_switcheroo client for MacBooks with apple-gmux

2023-02-16 Thread Orlando Chamberlain
Commit 3840c5bcc245 ("drm/amdgpu: disentangle runtime pm and
vga_switcheroo") made amdgpu only register a vga_switcheroo client for
GPU's with PX, however AMD GPUs in dual gpu Apple Macbooks do need to
register, but don't have PX. Instead of AMD's PX, they use apple-gmux.

Use apple_gmux_detect() to identify these gpus, and
pci_is_thunderbolt_attached() to ensure eGPUs connected to Dual GPU
Macbooks don't register with vga_switcheroo.

Fixes: 3840c5bcc245 ("drm/amdgpu: disentangle runtime pm and vga_switcheroo")
Link: 
https://lore.kernel.org/amd-gfx/20230210044826.9834-10-orlandoch@gmail.com/
Signed-off-by: Orlando Chamberlain 
---
v1->v2: Use apple_gmux_detect()
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2f28a8c02f64..ef8b996f0622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -3919,12 +3920,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
vga_client_register(adev->pdev, amdgpu_device_vga_set_decode);
 
-   if (amdgpu_device_supports_px(ddev)) {
-   px = true;
+   px = amdgpu_device_supports_px(ddev);
+
+   if (px || (!pci_is_thunderbolt_attached(adev->pdev) &&
+   apple_gmux_detect(NULL, NULL)))
vga_switcheroo_register_client(adev->pdev,
   &amdgpu_switcheroo_ops, px);
+
+   if (px)
vga_switcheroo_init_domain_pm_ops(adev->dev, 
&adev->vga_pm_domain);
-   }
 
if (adev->gmc.xgmi.pending_reset)
queue_delayed_work(system_wq, &mgpu_info.delayed_reset_work,
@@ -4029,6 +4033,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
 {
int idx;
+   bool px;
 
amdgpu_fence_driver_sw_fini(adev);
amdgpu_device_ip_fini(adev);
@@ -4048,10 +4053,16 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
 
kfree(adev->bios);
adev->bios = NULL;
-   if (amdgpu_device_supports_px(adev_to_drm(adev))) {
+
+   px = amdgpu_device_supports_px(adev_to_drm(adev));
+
+   if (px || (!pci_is_thunderbolt_attached(adev->pdev) &&
+   apple_gmux_detect(NULL, NULL)))
vga_switcheroo_unregister_client(adev->pdev);
+
+   if (px)
vga_switcheroo_fini_domain_pm_ops(adev->dev);
-   }
+
if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
vga_client_unregister(adev->pdev);
 
-- 
2.39.1



[PATCH] drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini()

2023-02-16 Thread Thomas Zimmermann
Move drm_fb_helper_unprepare() from drm_fb_helper_fini() into the
calling fbdev implementation. Avoids a possible stale mutex with
generic fbdev code.

As indicated by its name, drm_fb_helper_prepare() prepares struct
drm_fb_helper before setting up the fbdev support with a call to
drm_fb_helper_init(). In legacy fbdev emulation, this happens next
to each other. If successful, drm_fb_helper_fini() later tear down
the fbdev device and also unprepare via drm_fb_helper_unprepare().

Generic fbdev emulation prepares struct drm_fb_helper immediately
after allocating the instance. It only calls drm_fb_helper_init()
as part of processing a hotplug event. If the hotplug-handling fails,
it runs drm_fb_helper_fini(). This unprepares the fb-helper instance
and the next hotplug event runs on stale data.

Solve this by moving drm_fb_helper_unprepare() from drm_fb_helper_fini()
into the fbdev implementations. Call it right before freeing the
fb-helper instance.

Fixes: 4825797c36da ("drm/fb-helper: Introduce drm_fb_helper_unprepare()")
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/armada/armada_fbdev.c  | 3 +++
 drivers/gpu/drm/drm_fb_helper.c| 2 --
 drivers/gpu/drm/drm_fbdev_generic.c| 2 ++
 drivers/gpu/drm/exynos/exynos_drm_fbdev.c  | 3 ++-
 drivers/gpu/drm/gma500/framebuffer.c   | 2 ++
 drivers/gpu/drm/i915/display/intel_fbdev.c | 1 +
 drivers/gpu/drm/msm/msm_fbdev.c| 2 ++
 drivers/gpu/drm/omapdrm/omap_fbdev.c   | 2 ++
 drivers/gpu/drm/radeon/radeon_fb.c | 2 ++
 drivers/gpu/drm/tegra/fb.c | 1 +
 10 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_fbdev.c 
b/drivers/gpu/drm/armada/armada_fbdev.c
index 07e410c62b7a..0e44f53e9fa4 100644
--- a/drivers/gpu/drm/armada/armada_fbdev.c
+++ b/drivers/gpu/drm/armada/armada_fbdev.c
@@ -147,6 +147,7 @@ int armada_fbdev_init(struct drm_device *dev)
  err_fb_setup:
drm_fb_helper_fini(fbh);
  err_fb_helper:
+   drm_fb_helper_unprepare(fbh);
priv->fbdev = NULL;
return ret;
 }
@@ -164,6 +165,8 @@ void armada_fbdev_fini(struct drm_device *dev)
if (fbh->fb)
fbh->fb->funcs->destroy(fbh->fb);
 
+   drm_fb_helper_unprepare(fbh);
+
priv->fbdev = NULL;
}
 }
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 28c428e9c530..a39998047f8a 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -590,8 +590,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
}
mutex_unlock(&kernel_fb_helper_lock);
 
-   drm_fb_helper_unprepare(fb_helper);
-
if (!fb_helper->client.funcs)
drm_client_release(&fb_helper->client);
 }
diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index 365f80717fa1..4d6325e91565 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -65,6 +65,8 @@ static void drm_fbdev_fb_destroy(struct fb_info *info)
 
drm_client_framebuffer_delete(fb_helper->buffer);
drm_client_release(&fb_helper->client);
+
+   drm_fb_helper_unprepare(fb_helper);
kfree(fb_helper);
 }
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c 
b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
index b89e33af8da8..4929ffe5a09a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
@@ -183,8 +183,8 @@ int exynos_drm_fbdev_init(struct drm_device *dev)
 
 err_setup:
drm_fb_helper_fini(helper);
-
 err_init:
+   drm_fb_helper_unprepare(helper);
private->fb_helper = NULL;
kfree(fbdev);
 
@@ -219,6 +219,7 @@ void exynos_drm_fbdev_fini(struct drm_device *dev)
fbdev = to_exynos_fbdev(private->fb_helper);
 
exynos_drm_fbdev_destroy(dev, private->fb_helper);
+   drm_fb_helper_unprepare(private->fb_helper);
kfree(fbdev);
private->fb_helper = NULL;
 }
diff --git a/drivers/gpu/drm/gma500/framebuffer.c 
b/drivers/gpu/drm/gma500/framebuffer.c
index 1f04c07ee180..f471e0cb7298 100644
--- a/drivers/gpu/drm/gma500/framebuffer.c
+++ b/drivers/gpu/drm/gma500/framebuffer.c
@@ -427,6 +427,7 @@ int psb_fbdev_init(struct drm_device *dev)
 fini:
drm_fb_helper_fini(fb_helper);
 free:
+   drm_fb_helper_unprepare(fb_helper);
kfree(fb_helper);
return ret;
 }
@@ -439,6 +440,7 @@ static void psb_fbdev_fini(struct drm_device *dev)
return;
 
psb_fbdev_destroy(dev, dev_priv->fb_helper);
+   drm_fb_helper_unprepare(dev_priv->fb_helper);
kfree(dev_priv->fb_helper);
dev_priv->fb_helper = NULL;
 }
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/dis

Re: [RFC] drm/amd/display: Pass proper parent for DM backlight device registration

2023-02-16 Thread Alex Deucher
On Wed, Feb 15, 2023 at 6:38 AM Hans de Goede  wrote:
>
> The parent for the backlight device should be the drm-connector object,
> not the PCI device.
>
> Userspace relies on this to be able to detect which backlight class device
> to use on hybrid gfx devices where there may be multiple native (raw)
> backlight devices registered.
>
> Specifically gnome-settings-daemon expects the parent device to have
> an "enabled" sysfs attribute (as drm_connector devices do) and tests
> that this returns "enabled" when read.
>
> This aligns the parent of the backlight device with i915, nouveau, radeon.
> Note that drivers/gpu/drm/amd/amdgpu/atombios_encoders.c also already
> uses the drm_connector as parent, only amdgpu_dm.c used the PCI device
> as parent before this change.
>
> Note this is marked as a RFC because I don't have hw to test, so this
> has only been compile tested! If someone can test this on actual
> hw which hits the changed code path that would be great.
>
> Link: https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/issues/730
> Signed-off-by: Hans de Goede 

Patch looks fine to me.  I'll apply it and we can run it through our CI system.

Alex

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 31bce529f685..33b0e1de2770 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4065,7 +4065,8 @@ static const struct backlight_ops 
> amdgpu_dm_backlight_ops = {
>  };
>
>  static void
> -amdgpu_dm_register_backlight_device(struct amdgpu_display_manager *dm)
> +amdgpu_dm_register_backlight_device(struct amdgpu_display_manager *dm,
> +   struct amdgpu_dm_connector *aconnector)
>  {
> char bl_name[16];
> struct backlight_properties props = { 0 };
> @@ -4088,7 +4089,7 @@ amdgpu_dm_register_backlight_device(struct 
> amdgpu_display_manager *dm)
>  adev_to_drm(dm->adev)->primary->index + dm->num_of_edps);
>
> dm->backlight_dev[dm->num_of_edps] = 
> backlight_device_register(bl_name,
> -  
> adev_to_drm(dm->adev)->dev,
> +  
> aconnector->base.kdev,
>dm,
>
> &amdgpu_dm_backlight_ops,
>
> &props);
> @@ -4141,6 +4142,7 @@ static int initialize_plane(struct 
> amdgpu_display_manager *dm,
>
>
>  static void register_backlight_device(struct amdgpu_display_manager *dm,
> + struct amdgpu_dm_connector *aconnector,
>   struct dc_link *link)
>  {
> if ((link->connector_signal & (SIGNAL_TYPE_EDP | SIGNAL_TYPE_LVDS)) &&
> @@ -4151,7 +4153,7 @@ static void register_backlight_device(struct 
> amdgpu_display_manager *dm,
>  * is better then a black screen.
>  */
> if (!dm->backlight_dev[dm->num_of_edps])
> -   amdgpu_dm_register_backlight_device(dm);
> +   amdgpu_dm_register_backlight_device(dm, aconnector);
>
> if (dm->backlight_dev[dm->num_of_edps]) {
> dm->backlight_link[dm->num_of_edps] = link;
> @@ -4337,7 +4339,7 @@ static int amdgpu_dm_initialize_drm_device(struct 
> amdgpu_device *adev)
>
> if (ret) {
> 
> amdgpu_dm_update_connector_after_detect(aconnector);
> -   register_backlight_device(dm, link);
> +   register_backlight_device(dm, aconnector, 
> link);
>
> if (dm->num_of_edps)
> update_connector_ext_caps(aconnector);
> --
> 2.39.1
>


Re: [PATCHv2] drm/amdgpu: register a vga_switcheroo client for MacBooks with apple-gmux

2023-02-16 Thread Alex Deucher
On Thu, Feb 16, 2023 at 8:45 AM Orlando Chamberlain
 wrote:
>
> Commit 3840c5bcc245 ("drm/amdgpu: disentangle runtime pm and
> vga_switcheroo") made amdgpu only register a vga_switcheroo client for
> GPU's with PX, however AMD GPUs in dual gpu Apple Macbooks do need to
> register, but don't have PX. Instead of AMD's PX, they use apple-gmux.
>
> Use apple_gmux_detect() to identify these gpus, and
> pci_is_thunderbolt_attached() to ensure eGPUs connected to Dual GPU
> Macbooks don't register with vga_switcheroo.
>
> Fixes: 3840c5bcc245 ("drm/amdgpu: disentangle runtime pm and vga_switcheroo")
> Link: 
> https://lore.kernel.org/amd-gfx/20230210044826.9834-10-orlandoch@gmail.com/
> Signed-off-by: Orlando Chamberlain 

This needs ifdefs around the apple_gmux stuff so that it will build
without the gmux support.

Alex

> ---
> v1->v2: Use apple_gmux_detect()
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 -
>  1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2f28a8c02f64..ef8b996f0622 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -3919,12 +3920,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
> vga_client_register(adev->pdev, amdgpu_device_vga_set_decode);
>
> -   if (amdgpu_device_supports_px(ddev)) {
> -   px = true;
> +   px = amdgpu_device_supports_px(ddev);
> +
> +   if (px || (!pci_is_thunderbolt_attached(adev->pdev) &&
> +   apple_gmux_detect(NULL, NULL)))
> vga_switcheroo_register_client(adev->pdev,
>&amdgpu_switcheroo_ops, px);
> +
> +   if (px)
> vga_switcheroo_init_domain_pm_ops(adev->dev, 
> &adev->vga_pm_domain);
> -   }
>
> if (adev->gmc.xgmi.pending_reset)
> queue_delayed_work(system_wq, &mgpu_info.delayed_reset_work,
> @@ -4029,6 +4033,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
>  {
> int idx;
> +   bool px;
>
> amdgpu_fence_driver_sw_fini(adev);
> amdgpu_device_ip_fini(adev);
> @@ -4048,10 +4053,16 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
>
> kfree(adev->bios);
> adev->bios = NULL;
> -   if (amdgpu_device_supports_px(adev_to_drm(adev))) {
> +
> +   px = amdgpu_device_supports_px(adev_to_drm(adev));
> +
> +   if (px || (!pci_is_thunderbolt_attached(adev->pdev) &&
> +   apple_gmux_detect(NULL, NULL)))
> vga_switcheroo_unregister_client(adev->pdev);
> +
> +   if (px)
> vga_switcheroo_fini_domain_pm_ops(adev->dev);
> -   }
> +
> if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
> vga_client_unregister(adev->pdev);
>
> --
> 2.39.1
>


Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

2023-02-16 Thread Felix Kuehling

[+Suravee]

Am 2023-02-15 um 19:44 schrieb Jason Gunthorpe:

On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:

If I understand this correctly, the HW or the BIOS is doing something wrong
about reporting ACS. I don't know what the GPU driver can do other than add
some quirk to stop using AMD IOMMUv2 on this HW/BIOS.

How about this:

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index 864e4ffb6aa94e..cc027ce9a6e86f 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
  
  int amd_iommu_init_device(struct pci_dev *pdev, int pasids)

  {
+   struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
struct device_state *dev_state;
struct iommu_group *group;
unsigned long flags;
@@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
  
  	might_sleep();
  
+	if (!dev_data->ats.enabled)

+   return -EINVAL;
+
/*
 * When memory encryption is active the device is likely not in a
 * direct-mapped domain. Forbid using IOMMUv2 functionality for now.


Hi Suravee,

What to you think about this proposed change?

Regards,
  Felix



Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

2023-02-16 Thread Felix Kuehling

[+Suravee]

Am 2023-02-16 um 00:37 schrieb Vasant Hegde:

Hi Jason,


On 2/16/2023 6:14 AM, Jason Gunthorpe wrote:

On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:

If I understand this correctly, the HW or the BIOS is doing something wrong
about reporting ACS. I don't know what the GPU driver can do other than add
some quirk to stop using AMD IOMMUv2 on this HW/BIOS.

How about this:

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index 864e4ffb6aa94e..cc027ce9a6e86f 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
  
  int amd_iommu_init_device(struct pci_dev *pdev, int pasids)

  {
+   struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
struct device_state *dev_state;
struct iommu_group *group;
unsigned long flags;
@@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
  
  	might_sleep();
  
+	if (!dev_data->ats.enabled)

+   return -EINVAL;
+

Thanks for the proposed fix. But aactually this will not solve the issue because
current flow is :
   - in this function it tries to allocate new domain
   - Calls iommu_attach_group() which will call attach_device. In that path
 it will try to enable ATS/PASID and hitting error.

As I mentioned in other reply I think even current code returns error from
amd_iommu_init_device() to GPU. But the issue is, in __iommu_attach_group() path
it detached device from current domain, failed to attach to new domain and
returned error. We didn't put the device back to old domain thats causing the
issue. Below series should fix this issue.

https://lore.kernel.org/linux-iommu/20230215052642.6016-1-vasant.he...@amd.com/

-Vasant



[bug report] drm/amd/display: break down dc_link.c

2023-02-16 Thread Dan Carpenter
Hello Wenjing Liu,

This is a semi-automatic email about new static checker warnings.

The patch 5461d1ea: "drm/amd/display: break down dc_link.c" from 
Jan 18, 2023, leads to the following Smatch complaint:

drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:365 
dc_link_construct_phy()
warn: variable dereferenced before check 'link->link_enc' (see line 362)

drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c
   356  
   357  enc_init_data.transmitter =
   358  translate_encoder_to_transmitter(enc_init_data.encoder);
   359  link->link_enc =
   360  link->dc->res_pool->funcs->link_enc_create(dc_ctx, 
&enc_init_data);
   361  
   362  DC_LOG_DC("BIOS object table - DP_IS_USB_C: %d", 
link->link_enc->features.flags.bits.DP_IS_USB_C);
 

Dereference

   363  DC_LOG_DC("BIOS object table - IS_DP2_CAPABLE: %d", 
link->link_enc->features.flags.bits.IS_DP2_CAPABLE);



   364  
   365  if (!link->link_enc) {
 ^^
Checked too late

   366  DC_ERROR("Failed to create link encoder!\n");
   367  goto link_enc_create_fail;
   368  }
   369  

regards,
dan carpenter


[linux-next:master] BUILD REGRESSION 509583475828c4fd86897113f78315c1431edcc3

2023-02-16 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 509583475828c4fd86897113f78315c1431edcc3  Add linux-next specific 
files for 20230216

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202302061911.c7xvhx9v-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302062224.byzetxh1-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302111601.jty4lkra-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302112104.g75cghzd-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302142145.in5wznpf-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302151041.0sws1rhk-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302161117.pnuysgwi-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302162019.2whirksa-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302162331.zgyxciuh-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

Documentation/admin-guide/pm/amd-pstate.rst:343: WARNING: duplicate label 
admin-guide/pm/amd-pstate:user space interface in ``sysfs``, other instance in 
Documentation/admin-guide/pm/amd-pstate.rst
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] 
undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] 
undefined!
ERROR: modpost: "walk_hmem_resources" [drivers/dax/hmem/dax_hmem.ko] undefined!
arch/mips/include/asm/page.h:255:55: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
arch/mips/kernel/vpe.c:643:35: error: no member named 'mod_mem' in 'struct 
module'
arch/mips/kernel/vpe.c:643:41: error: 'struct module' has no member named 
'mod_mem'
cxl.c:(.exit.text+0x32): undefined reference to `cxl_driver_unregister'
cxl.c:(.init.text+0x3c): undefined reference to `__cxl_driver_register'
cxl.c:(.text+0x92): undefined reference to `to_cxl_dax_region'
drivers/cxl/core/region.c:2628:6: warning: variable 'rc' is used uninitialized 
whenever 'if' condition is false [-Wsometimes-uninitialized]
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn30/dcn30_optc.c:294:6: warning: no 
previous prototype for 'optc3_wait_drr_doublebuffer_pending_clear' 
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_hubbub.c:1011:6: warning: 
no previous prototype for 'hubbub31_init' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_hubbub.c:948:6: warning: 
no previous prototype for 'hubbub32_init' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_hubp.c:158:6: warning: no 
previous prototype for 'hubp32_init' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:62:18: 
warning: variable 'cursor_bpp' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_detection.c:1199: warning: 
expecting prototype for dc_link_detect_connection_type(). Prototype was for 
link_detect_connection_type() instead
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:1292:32:
 warning: variable 'result_write_min_hblank' set but not used 
[-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training.c:1586:38:
 warning: variable 'result' set but not used [-Wunused-but-set-variable]

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/clk/ingenic/jz4760-cgu.c:80 jz4760_cgu_calc_m_n_od() error: 
uninitialized symbol 'od'.
drivers/media/i2c/max9286.c:802 max9286_s_stream() error: buffer overflow 
'priv->fmt' 4 <= 32
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c:415 bnxt_rdma_aux_device_init() 
warn: possible memory leak of 'edev'
drivers/net/phy/phy-c45.c:296 genphy_c45_an_config_aneg() error: uninitialized 
symbol 'changed'.
drivers/net/phy/phy-c45.c:716 genphy_c45_write_eee_adv() error: uninitialized 
symbol 'changed'.
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8188e.c:1702 
rtl8188e_handle_ra_tx_report2() warn: ignoring unreachable code.
drivers/usb/gadget/composite.c:2082:33: sparse: sparse: restricted __le16 
degrades to integer
drivers/virtio/virtio_ring.c:692 virtqueue_add_split_vring() error: 
uninitialized symbol 'prev'.
fs/ntfs3/super.c:1351 ntfs_fill_super() warn: passing a valid pointer to 
'PTR_ERR'
net/mac80211/rx.c:2947 __ieee80211_rx_h_amsdu() error: we previously assumed 
'rx->sta' could be null (see line 2933)
pahole: .tmp_vmlinux.btf: No such file or directory

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_detection.c:warning:expecting-prototype-for-dc_link_detect_connection_type().-Prototype-was-for-link_detect_connec

Re: [PATCH v2 20/20] jump_label: RFC - tolerate toggled state

2023-02-16 Thread Jason Baron




On 1/17/23 6:57 AM, Peter Zijlstra wrote:

On Fri, Jan 13, 2023 at 12:30:16PM -0700, Jim Cromie wrote:

__jump_label_patch currently will "crash the box" if it finds a
jump_entry not as expected.  ISTM this overly harsh; it doesn't
distinguish between "alternate/opposite" state, and truly
"insane/corrupted".

The "opposite" (but well-formed) state is a milder mis-initialization
problem, and some less severe mitigation seems practical.  ATM this
just warns about it; a range/enum of outcomes: warn, crash, silence,
ok, fixup-continue, etc, are possible on a case-by-case basis.

Ive managed to create this mis-initialization condition with
test_dynamic_debug.ko & _submod.ko.  These replicate DRM's regression
on DRM_USE_DYNAMIC_DEBUG=y; drm.debug callsites in drivers/helpers
(dependent modules) are not enabled along with those in drm.ko itself.




Ive hit this case a few times, but havent been able to isolate the
when and why.

warn-only is something of a punt, and I'm still left with remaining
bugs which are likely related; I'm able to toggle the p-flag on
callsites in the submod, but their enablement still doesn't yield
logging activity.


Right; having been in this is state is bad since it will generate
inconsistent code-flow. Full on panic *might* not be warranted (as it
does for corrupted text) but it is still a fairly bad situation -- so
I'm not convinced we want to warn and carry on.

It would be really good to figure out why the site was skipped over and
got out of skew.

Given it's all module stuff, the 'obvious' case would be something like
a race between adding the new sites and flipping it, but I'm not seeing
how -- things are rather crudely serialized by jump_label_mutex.


Indeed, looks like dynamic debug introduces a new path in this series to 
that tries to toggle the jump label sites before they have been 
initialized, which ends up with the jump label key enabled but only some 
of the sites in the correct state. Then when the key is subsequently 
disabled it finds some in the wrong state. I just posted a patch for 
dynamic debug to use the module callback notifiers so it's ordered 
properly against jump label.


Note this isn't an issue in the current tree b/c there is no path to 
toggle the sites currently before they are initialized, but Jim's series 
here adds such a path.


Thanks,

-Jason




The only other option I can come up with is that somehow the update
condition in jump_label_add_module() is somehow wrong.



Re: [PATCH v2 1/9] drm/amdgpu: UAPI for user queue management

2023-02-16 Thread Alex Deucher
On Wed, Feb 15, 2023 at 1:44 PM Shashank Sharma  wrote:
>
> From: Alex Deucher 
>
> This patch intorduces new UAPI/IOCTL for usermode graphics
> queue. The userspace app will fill this structure and request
> the graphics driver to add a graphics work queue for it. The
> output of this UAPI is a queue id.
>
> This UAPI maps the queue into GPU, so the graphics app can start
> submitting work to the queue as soon as the call returns.
>
> V2: Addressed review comments from Alex and Christian
> - Make the doorbell offset's comment clearer
> - Change the output parameter name to queue_id
>
> Cc: Alex Deucher 
> Cc: Christian Koenig 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Shashank Sharma 
> ---
>  include/uapi/drm/amdgpu_drm.h | 55 +++
>  1 file changed, 55 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 4038abe8505a..2bc4869ee279 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -54,6 +54,7 @@ extern "C" {
>  #define DRM_AMDGPU_VM  0x13
>  #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14
>  #define DRM_AMDGPU_SCHED   0x15
> +#define DRM_AMDGPU_USERQ   0x16
>
>  #define DRM_IOCTL_AMDGPU_GEM_CREATEDRM_IOWR(DRM_COMMAND_BASE + 
> DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>  #define DRM_IOCTL_AMDGPU_GEM_MMAP  DRM_IOWR(DRM_COMMAND_BASE + 
> DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -71,6 +72,7 @@ extern "C" {
>  #define DRM_IOCTL_AMDGPU_VMDRM_IOWR(DRM_COMMAND_BASE + 
> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>  #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + 
> DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>  #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE + 
> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>
>  /**
>   * DOC: memory domains
> @@ -302,6 +304,59 @@ union drm_amdgpu_ctx {
> union drm_amdgpu_ctx_out out;
>  };
>
> +/* user queue IOCTL */
> +#define AMDGPU_USERQ_OP_CREATE 1
> +#define AMDGPU_USERQ_OP_FREE   2
> +
> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> +#define AMDGPU_USERQ_MQD_FLAGS_AQL (1 << 1)
> +
> +struct drm_amdgpu_userq_mqd {
> +   /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> +   __u32   flags;
> +   /** IP type: AMDGPU_HW_IP_* */
> +   __u32   ip_type;
> +   /** GEM object handle */
> +   __u32   doorbell_handle;
> +   /** Doorbell's offset in the doorbell bo */
> +   __u32   doorbell_offset;
> +   /** GPU virtual address of the queue */
> +   __u64   queue_va;
> +   /** Size of the queue in bytes */
> +   __u64   queue_size;
> +   /** GPU virtual address of the rptr */
> +   __u64   rptr_va;
> +   /** GPU virtual address of the wptr */
> +   __u64   wptr_va;
> +   /** GPU virtual address of the shadow context space */
> +   __u64   shadow_va;
> +};

We should test this with SDMA user queues as well to make sure we
haven't missed anything that it needs.  If sdma or vcn have major
differences, we might want to consider making drm_amdgpu_userq_mqd a
union.

Alex


> +
> +struct drm_amdgpu_userq_in {
> +   /** AMDGPU_USERQ_OP_* */
> +   __u32   op;
> +   /** Flags */
> +   __u32   flags;
> +   /** Queue handle to associate the queue free call with,
> +* unused for queue create calls */
> +   __u32   queue_id;
> +   __u32   pad;
> +   /** Queue descriptor */
> +   struct drm_amdgpu_userq_mqd mqd;
> +};
> +
> +struct drm_amdgpu_userq_out {
> +   /** Queue handle */
> +   __u32   queue_id;
> +   /** Flags */
> +   __u32   flags;
> +};
> +
> +union drm_amdgpu_userq {
> +   struct drm_amdgpu_userq_in in;
> +   struct drm_amdgpu_userq_out out;
> +};
> +
>  /* vm ioctl */
>  #define AMDGPU_VM_OP_RESERVE_VMID  1
>  #define AMDGPU_VM_OP_UNRESERVE_VMID2
> --
> 2.34.1
>


Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

2023-02-16 Thread Felix Kuehling
Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for 
device 1002:9874
Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added 
due to errors 
This look like IOMMU device initialization still fails (but more 
gracefully now). Vasant, is that expected?


This would lead to KFD not being available on Carrizo with this kernel, 
which is probably not a big limitation in practice. It would only affect 
compute applications using the ROCm user mode stack. I don't think 
anyone does that these days on these old APUs.


The SMU errors seem unrelated to this unless there is some subtle 
interaction I'm missing.


Regards,
  Felix


Am 2023-02-16 um 13:59 schrieb Matt Fagnani:

Vasant,

I applied your four patches to 6.2-rc8 and built that. The black 
screen, null pointer dereference, and warnings didn't happen when 
booting 6.2-rc8 with your patches. There were errors that the IOMMU 
wasn't restarted when amdgpu and amdkfd was starting though at kernel: 
kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874. I don't 
know if those IOMMU errors were expected or not, but I did see those 
types of messages when I used amd_iommu=off to work around the black 
screen before. I didn't use amd_iommu=off when testing 6.2-rc8 with 
your patches. There were also a different amdgpu warning at 
drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 
smu8_send_msg_to_smc_with_parameter+0x103/0x140 and errors about 
amdgpu timeouts on 1/3 boots. Plasma took over a minute to start and 
shut down on that boot which was unusually long. I've seen those sorts 
of amdgpu warnings and errors infrequently before so they might be 
unrelated to the IOMMU problem. The part of the journal where those 
errors started was the following.


Feb 16 13:22:31 kernel: [drm] amdgpu kernel modesetting enabled.
Feb 16 13:22:31 kernel: amdgpu: Topology: Add APU node [0x0:0x0]
Feb 16 13:22:31 kernel: [drm] initializing kernel modesetting (CARRIZO 
0x1002:0x9874 0x103C:0x8332 0xCA).

Feb 16 13:22:31 kernel: [drm] register mmio base: 0xF040
Feb 16 13:22:31 kernel: [drm] register mmio size: 262144
Feb 16 13:22:31 kernel: [drm] add ip block number 0 
Feb 16 13:22:31 kernel: [drm] add ip block number 1 
Feb 16 13:22:31 kernel: [drm] add ip block number 2 
Feb 16 13:22:31 kernel: [drm] add ip block number 3 
Feb 16 13:22:31 kernel: [drm] add ip block number 4 
Feb 16 13:22:31 kernel: [drm] add ip block number 5 
Feb 16 13:22:31 kernel: [drm] add ip block number 6 
Feb 16 13:22:31 kernel: [drm] add ip block number 7 
Feb 16 13:22:31 kernel: [drm] add ip block number 8 
Feb 16 13:22:31 kernel: [drm] add ip block number 9 
Feb 16 13:22:31 kernel: amdgpu :00:01.0: amdgpu: Fetched VBIOS 
from VFCT

Feb 16 13:22:31 kernel: amdgpu: ATOM BIOS: 113-C75100-031
Feb 16 13:22:31 kernel: [drm] UVD is enabled in physical mode
Feb 16 13:22:31 kernel: [drm] VCE enabled in physical mode
Feb 16 13:22:31 kernel: Console: switching to colour dummy device 80x25
Feb 16 13:22:31 kernel: amdgpu :00:01.0: vgaarb: deactivate vga 
console
Feb 16 13:22:31 kernel: amdgpu :00:01.0: amdgpu: Trusted Memory 
Zone (TMZ) feature not supported
Feb 16 13:22:31 kernel: [drm] vm size is 64 GB, 2 levels, block size 
is 10-bit, fragment size is 9-bit
Feb 16 13:22:31 kernel: amdgpu :00:01.0: amdgpu: VRAM: 512M 
0x00F4 - 0x00F41FFF (512M used)
Feb 16 13:22:31 kernel: amdgpu :00:01.0: amdgpu: GART: 1024M 
0x00FF - 0x00FF3FFF

Feb 16 13:22:31 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
Feb 16 13:22:31 kernel: [drm] RAM width 64bits UNKNOWN
Feb 16 13:22:31 kernel: [drm] amdgpu: 512M of VRAM memory ready
Feb 16 13:22:31 kernel: [drm] amdgpu: 3704M of GTT memory ready.
Feb 16 13:22:31 kernel: [drm] GART: num cpu pages 262144, num gpu 
pages 262144
Feb 16 13:22:31 kernel: [drm] PCIE GART of 1024M enabled (table at 
0x00F40060).

Feb 16 13:22:31 kernel: amdgpu: hwmgr_sw_init smu backed is smu8_smu
Feb 16 13:22:31 kernel: [drm] Found UVD firmware Version: 1.91 Family 
ID: 11

Feb 16 13:22:31 kernel: [drm] UVD ENC is disabled
Feb 16 13:22:31 kernel: [drm] Found VCE firmware Version: 52.4 Binary 
ID: 3

Feb 16 13:22:31 kernel: amdgpu: smu version 27.18.00
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: values for Engine clock
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 30
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 48
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 533340
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 576000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 626090
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 685720
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 72
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 757900
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: Validation clocks:
Feb 16 13:22:31 kernel: [drm] DM_PPLIB:    engine_max_clock: 75790
Feb 16 13:22:31 kernel: [drm] DM_PPLIB:    memory_max_clock: 93300
Feb 16 13:22:31 kernel: [drm] DM_PPLIB:    level

Re: [PATCH] drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini()

2023-02-16 Thread Daniel Vetter
On Thu, Feb 16, 2023 at 03:06:20PM +0100, Thomas Zimmermann wrote:
> Move drm_fb_helper_unprepare() from drm_fb_helper_fini() into the
> calling fbdev implementation. Avoids a possible stale mutex with
> generic fbdev code.
> 
> As indicated by its name, drm_fb_helper_prepare() prepares struct
> drm_fb_helper before setting up the fbdev support with a call to
> drm_fb_helper_init(). In legacy fbdev emulation, this happens next
> to each other. If successful, drm_fb_helper_fini() later tear down
> the fbdev device and also unprepare via drm_fb_helper_unprepare().
> 
> Generic fbdev emulation prepares struct drm_fb_helper immediately
> after allocating the instance. It only calls drm_fb_helper_init()
> as part of processing a hotplug event. If the hotplug-handling fails,
> it runs drm_fb_helper_fini(). This unprepares the fb-helper instance
> and the next hotplug event runs on stale data.
> 
> Solve this by moving drm_fb_helper_unprepare() from drm_fb_helper_fini()
> into the fbdev implementations. Call it right before freeing the
> fb-helper instance.
> 
> Fixes: 4825797c36da ("drm/fb-helper: Introduce drm_fb_helper_unprepare()")
> Cc: Thomas Zimmermann 
> Cc: Javier Martinez Canillas 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: dri-de...@lists.freedesktop.org
> 
> Signed-off-by: Thomas Zimmermann 

This reminds me of an old patch I just recently stumbled over again:

https://lore.kernel.org/dri-devel/Y3St2VHJ7jEmcNFw@phenom.ffwll.local/

Should I resurrect that one maybe and send it out? I think that also ties
a bit into your story here.

> ---
>  drivers/gpu/drm/armada/armada_fbdev.c  | 3 +++
>  drivers/gpu/drm/drm_fb_helper.c| 2 --
>  drivers/gpu/drm/drm_fbdev_generic.c| 2 ++
>  drivers/gpu/drm/exynos/exynos_drm_fbdev.c  | 3 ++-
>  drivers/gpu/drm/gma500/framebuffer.c   | 2 ++
>  drivers/gpu/drm/i915/display/intel_fbdev.c | 1 +
>  drivers/gpu/drm/msm/msm_fbdev.c| 2 ++
>  drivers/gpu/drm/omapdrm/omap_fbdev.c   | 2 ++
>  drivers/gpu/drm/radeon/radeon_fb.c | 2 ++
>  drivers/gpu/drm/tegra/fb.c | 1 +
>  10 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/armada/armada_fbdev.c 
> b/drivers/gpu/drm/armada/armada_fbdev.c
> index 07e410c62b7a..0e44f53e9fa4 100644
> --- a/drivers/gpu/drm/armada/armada_fbdev.c
> +++ b/drivers/gpu/drm/armada/armada_fbdev.c
> @@ -147,6 +147,7 @@ int armada_fbdev_init(struct drm_device *dev)
>   err_fb_setup:
>   drm_fb_helper_fini(fbh);
>   err_fb_helper:
> + drm_fb_helper_unprepare(fbh);
>   priv->fbdev = NULL;
>   return ret;
>  }
> @@ -164,6 +165,8 @@ void armada_fbdev_fini(struct drm_device *dev)
>   if (fbh->fb)
>   fbh->fb->funcs->destroy(fbh->fb);
>  
> + drm_fb_helper_unprepare(fbh);
> +
>   priv->fbdev = NULL;
>   }
>  }
> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> index 28c428e9c530..a39998047f8a 100644
> --- a/drivers/gpu/drm/drm_fb_helper.c
> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -590,8 +590,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)

I think it would be good to update the kerneldoc of _init() and _fini()
here to mention each another like we usually do with these pairs. Same
with prepare/unprepare() although the latter rerfences _prepare() already.

>   }
>   mutex_unlock(&kernel_fb_helper_lock);
>  
> - drm_fb_helper_unprepare(fb_helper);
> -
>   if (!fb_helper->client.funcs)
>   drm_client_release(&fb_helper->client);
>  }
> diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
> b/drivers/gpu/drm/drm_fbdev_generic.c
> index 365f80717fa1..4d6325e91565 100644
> --- a/drivers/gpu/drm/drm_fbdev_generic.c
> +++ b/drivers/gpu/drm/drm_fbdev_generic.c
> @@ -65,6 +65,8 @@ static void drm_fbdev_fb_destroy(struct fb_info *info)
>  
>   drm_client_framebuffer_delete(fb_helper->buffer);
>   drm_client_release(&fb_helper->client);
> +
> + drm_fb_helper_unprepare(fb_helper);
>   kfree(fb_helper);
>  }
>  
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c 
> b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
> index b89e33af8da8..4929ffe5a09a 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
> @@ -183,8 +183,8 @@ int exynos_drm_fbdev_init(struct drm_device *dev)
>  
>  err_setup:
>   drm_fb_helper_fini(helper);
> -
>  err_init:
> + drm_fb_helper_unprepare(helper);
>   private->fb_helper = NULL;
>   kfree(fbdev);
>  
> @@ -219,6 +219,7 @@ void exynos_drm_fbdev_fini(struct drm_device *dev)
>   fbdev = to_exynos_fbdev(private->fb_helper);
>  
>   exynos_drm_fbdev_destroy(dev, private->fb_helper);
> + drm_fb_helper_unprepare(private->fb_helper);
>   kfree(fbdev);
>   private->fb_helper = NULL;
>  }
> diff --git a/drivers/gpu/drm/gma500/framebuffer.c 
> b/drivers/gpu/drm/gma50

Re: [PATCH 2/3] drm/connector: Add enum documentation to drm_colorspace

2023-02-16 Thread Harry Wentland



On 2/8/23 03:56, Pekka Paalanen wrote:
> On Fri,  3 Feb 2023 02:07:43 +
> Joshua Ashton  wrote:
> 
>> To match the other enums, and add more information about these values.
>>
>> Signed-off-by: Joshua Ashton 
>>
>> Cc: Pekka Paalanen 
>> Cc: Sebastian Wick 
>> Cc: vitaly.pros...@amd.com
>> Cc: Uma Shankar 
>> Cc: Ville Syrjälä 
>> Cc: Joshua Ashton 
>> Cc: dri-de...@lists.freedesktop.org
>> Cc: amd-gfx@lists.freedesktop.org
>> ---
>>  include/drm/drm_connector.h | 41 +++--
>>  1 file changed, 39 insertions(+), 2 deletions(-)
> 
> Hi Joshua,
> 
> sorry for pushing you into a rabbit hole a bit. :-)
> 
>>
>> diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h
>> index edef65388c29..eb4cc9076e16 100644
>> --- a/include/drm/drm_connector.h
>> +++ b/include/drm/drm_connector.h
>> @@ -363,13 +363,50 @@ enum drm_privacy_screen_status {
>>  PRIVACY_SCREEN_ENABLED_LOCKED,
>>  };
>>  
>> -/*
>> - * This is a consolidated colorimetry list supported by HDMI and
>> +/**
>> + * enum drm_colorspace - color space
> 
> Documenting this enum is really nice. What would be even better if
> there was similar documentation in the UAPI doc of "Colorspace" under
> https://www.kernel.org/doc/html/latest/gpu/drm-kms.html#standard-connector-properties
> listing the strings that userspace must use/expect and what they refer
> to.
> 
> 
>> + *
>> + * This enum is a consolidated colorimetry list supported by HDMI and
>>   * DP protocol standard. The respective connectors will register
>>   * a property with the subset of this list (supported by that
>>   * respective protocol). Userspace will set the colorspace through
>>   * a colorspace property which will be created and exposed to
> 
> Could this refer to "Colorspace" property explicitly instead of some
> unmentioned property?
> 
>>   * userspace.
>> + *
>> + * @DRM_MODE_COLORIMETRY_DEFAULT:
>> + *   sRGB (IEC 61966-2-1) or
>> + *   ITU-R BT.601 colorimetry format
> 
> Is this what the "driver will set the colorspace" comment actually
> means? If so, I think the comment "driver will set the colorspace"
> could be better or replaced with "not from any standard" or "undefined".
> 

It's sRGB when RGB encoding is used and BT.601 for YCbCr.

> sRGB and BT.601 have different primaries. There are actually two
> different cases of BT.601 primaries: the 525 line and 625 line. How
> does that work? Are the drivers really choosing anything, or will they
> just send "undefined" to the sink, and then the sink does whatever it
> does?
> 
> Or is this *only* about the RGB-to-YCbCr conversion matrix and not
> about colorimetry at all?
> 
> If it's only about the conversion matrix (MatrixCoefficients in CICP
> (H.273) terms), then which ones of the below also define only the
> MatrixCoefficients but no colorimetry?
> 
>> + * @DRM_MODE_COLORIMETRY_SMPTE_170M_YCC:
>> + *   SMPTE ST 170M colorimetry format
>> + * @DRM_MODE_COLORIMETRY_BT709_YCC:
>> + *   ITU-R BT.709 colorimetry format
>> + * @DRM_MODE_COLORIMETRY_XVYCC_601:
>> + *   xvYCC601 colorimetry format
>> + * @DRM_MODE_COLORIMETRY_XVYCC_709:
>> + *   xvYCC709 colorimetry format
>> + * @DRM_MODE_COLORIMETRY_SYCC_601:
>> + *   sYCC601 colorimetry format
>> + * @DRM_MODE_COLORIMETRY_OPYCC_601:
>> + *   opYCC601 colorimetry format
>> + * @DRM_MODE_COLORIMETRY_OPRGB:
>> + *   opRGB colorimetry format
>> + * @DRM_MODE_COLORIMETRY_BT2020_CYCC:
>> + *   ITU-R BT.2020 Y'c C'bc C'rc (linear) colorimetry format
> 
> Is this one known as the constant luminance variant which requires
> KMS/driver/hardware knowing also the transfer characteristic function?
> 

Constant luminance as defined in BT.2020. Table 4 and 5
of the spec talk about it.

> Is there perhaps an assumed TF here, since there is no KMS property to
> set a TF? Oh, maybe all of these imply the respective TF from the spec?
> 
> I suspect the "linear" should read as "constant luminance".
> 
>> + * @DRM_MODE_COLORIMETRY_BT2020_RGB:
>> + *   ITU-R BT.2020 R' G' B' colorimetry format
>> + * @DRM_MODE_COLORIMETRY_BT2020_YCC:
>> + *   ITU-R BT.2020 Y' C'b C'r colorimetry format
> 
> ...compared to this one known as the non-constant luminance variant,
> i.e. "the simple RGB-to-YCbCr conversion"?
> 

Agreed.

The CTA-681 spec actually treats these as the same value
and for DP the R'G'G' (for RGB encoded signals) and the
Y'C'bC'r version (for YCbCr encoded signals) don't share
an actual value but you can't use the R'G'B' for YCbCr
and vice versa.

Rolling both into one seems to be the most reasonable thing
to do and doesn't clash with the spec. We can even combine
it later with a new "encoding" property where userspace can
provide a suggested encoding (or use the default, driver-selected
one).

>> + * @DRM_MODE_COLORIMETRY_DCI_P3_RGB_D65:
>> + *   DCI-P3 (SMPTE RP 431-2) colorimetry format
>> + * @DRM_MODE_COLORIMETRY_DCI_P3_RGB_THEATER:
>> + *   DCI-P3 (SMPTE RP 431-2) colorimetry format
> 
> These two can't both be the same, right? That is

Re: [PATCH 01/32] drm/amdkfd: add debug and runtime enable interface

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

Introduce the GPU debug operations interface.

For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU
instruction set, provide the necessary interface to allow the debugger
to HW debug-mode set and query exceptions per HSA queue, process or
device.

The runtime_enable interface coordinates exception handling with the
HSA runtime.

Usage is available in the kern docs at uapi/linux/kfd_ioctl.h.

v2: was previously reviewed but removed deprecrated wave launch modes
(kill and disable).
Also remove non-needed dbg flag option.
Add revision and subvendor info to debug device snapshot entry.
Add trap on wave start and end override option.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  48 ++
  include/uapi/linux/kfd_ioctl.h   | 663 ++-
  2 files changed, 710 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f79b8e964140..d3b019e64093 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2645,6 +2645,48 @@ static int kfd_ioctl_criu(struct file *filep, struct 
kfd_process *p, void *data)
return ret;
  }
  
+static int kfd_ioctl_runtime_enable(struct file *filep, struct kfd_process *p, void *data)

+{
+   return 0;
+}
+
+static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, 
void *data)
+{
+   struct kfd_ioctl_dbg_trap_args *args = data;
+   int r = 0;
+
+   if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("Debugging does not support sched_policy %i", 
sched_policy);
+   return -EINVAL;
+   }
+
+   switch (args->op) {
+   case KFD_IOC_DBG_TRAP_ENABLE:
+   case KFD_IOC_DBG_TRAP_DISABLE:
+   case KFD_IOC_DBG_TRAP_SEND_RUNTIME_EVENT:
+   case KFD_IOC_DBG_TRAP_SET_EXCEPTIONS_ENABLED:
+   case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_OVERRIDE:
+   case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_MODE:
+   case KFD_IOC_DBG_TRAP_SUSPEND_QUEUES:
+   case KFD_IOC_DBG_TRAP_RESUME_QUEUES:
+   case KFD_IOC_DBG_TRAP_SET_NODE_ADDRESS_WATCH:
+   case KFD_IOC_DBG_TRAP_CLEAR_NODE_ADDRESS_WATCH:
+   case KFD_IOC_DBG_TRAP_SET_FLAGS:
+   case KFD_IOC_DBG_TRAP_QUERY_DEBUG_EVENT:
+   case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
+   case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
+   case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
+   pr_warn("Debugging not supported yet\n");
+   r = -EACCES;
+   break;
+   default:
+   pr_err("Invalid option: %i\n", args->op);
+   r = -EINVAL;
+   }
+
+   return r;
+}
+
  #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
.cmd_drv = 0, .name = #ioctl}
@@ -2754,6 +2796,12 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
  
  	AMDKFD_IOCTL_DEF(AMDKFD_IOC_AVAILABLE_MEMORY,

kfd_ioctl_get_available_memory, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_RUNTIME_ENABLE,
+   kfd_ioctl_runtime_enable, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP,
+   kfd_ioctl_set_debug_trap, 0),
  };
  
  #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 42b60198b6c5..9ef4eed45c19 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -109,6 +109,32 @@ struct kfd_ioctl_get_available_memory_args {
__u32 pad;
  };
  
+struct kfd_dbg_device_info_entry {

+   __u64 exception_status;
+   __u64 lds_base;
+   __u64 lds_limit;
+   __u64 scratch_base;
+   __u64 scratch_limit;
+   __u64 gpuvm_base;
+   __u64 gpuvm_limit;
+   __u32 gpu_id;
+   __u32 location_id;
+   __u32 vendor_id;
+   __u32 device_id;
+   __u32 revision_id;
+   __u32 subsystem_vendor_id;
+   __u32 subsystem_device_id;
+   __u32 fw_version;
+   __u32 gfx_target_version;
+   __u32 simd_count;
+   __u32 max_waves_per_simd;
+   __u32 array_count;
+   __u32 simd_arrays_per_engine;
+   __u32 capability;
+   __u32 debug_prop;
+   __u32 pad;
+};
+
  /* For kfd_ioctl_set_memory_policy_args.default_policy and alternate_policy */
  #define KFD_IOC_CACHE_POLICY_COHERENT 0
  #define KFD_IOC_CACHE_POLICY_NONCOHERENT 1
@@ -766,6 +792,635 @@ struct kfd_ioctl_set_xnack_mode_args {
__s32 xnack_enabled;
  };
  
+/* Wave launch override modes */

+enum kfd_dbg_trap_override_mode {
+   KFD_DBG_TRAP_OVERRIDE_OR = 0,
+   KFD_DBG_TRAP_OVERRIDE_REPLACE = 1
+};
+
+/* Wave launch overrides */
+enum kfd_dbg_trap_mask {
+   KFD_DBG_TRAP_MASK_FP_INVALID = 1,
+   KFD_DBG_TRAP_MASK_FP_INPUT_DENORMAL = 2,
+   KFD_DBG_TRAP_MASK_FP_DIVIDE_BY_ZER

Re: [PATCH 02/32] drm/amdkfd: display debug capabilities

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

Expose debug capabilities in the KFD topology node's HSA capabilities and
debug properties flags.

Ensure correct capabilities are exposed based on firmware support.

Flag definitions can be referenced in uapi/linux/kfd_sysfs.h.

v2: v1 was reviewed but re-requesting review for the following.
- remove asic family code name comments in firmware support checking
- add gfx11 requirements in fw support checks and debug props and caps

Signed-off-by: Jonathan Kim 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 --
  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 ++
  include/uapi/linux/kfd_sysfs.h|  15 
  3 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3fdaba56be6f..647a14142da9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -551,6 +551,8 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
  dev->gpu->mec_fw_version);
sysfs_show_32bit_prop(buffer, offs, "capability",
  dev->node_props.capability);
+   sysfs_show_64bit_prop(buffer, offs, "debug_prop",
+ dev->node_props.debug_prop);
sysfs_show_32bit_prop(buffer, offs, "sdma_fw_version",
  dev->gpu->sdma_fw_version);
sysfs_show_64bit_prop(buffer, offs, "unique_id",
@@ -1865,6 +1867,97 @@ static int kfd_topology_add_device_locked(struct kfd_dev 
*gpu, uint32_t gpu_id,
return res;
  }
  
+static void kfd_topology_set_dbg_firmware_support(struct kfd_topology_device *dev)

+{
+   bool firmware_supported = true;
+
+   if (KFD_GC_VERSION(dev->gpu) >= IP_VERSION(11, 0, 0) &&
+   KFD_GC_VERSION(dev->gpu) < IP_VERSION(12, 0, 0)) {
+   firmware_supported =
+   (dev->gpu->adev->mes.sched_version & 
AMDGPU_MES_VERSION_MASK) >= 9;
+   goto out;
+   }
+
+   /*
+* Note: Any unlisted devices here are assumed to support exception 
handling.
+* Add additional checks here as needed.
+*/
+   switch (KFD_GC_VERSION(dev->gpu)) {
+   case IP_VERSION(9, 0, 1):
+   firmware_supported = dev->gpu->mec_fw_version >= 459 + 32768;
+   break;
+   case IP_VERSION(9, 1, 0):
+   case IP_VERSION(9, 2, 1):
+   case IP_VERSION(9, 2, 2):
+   case IP_VERSION(9, 3, 0):
+   case IP_VERSION(9, 4, 0):
+   firmware_supported = dev->gpu->mec_fw_version >= 459;
+   break;
+   case IP_VERSION(9, 4, 1):
+   firmware_supported = dev->gpu->mec_fw_version >= 60;
+   break;
+   case IP_VERSION(9, 4, 2):
+   firmware_supported = dev->gpu->mec_fw_version >= 51;
+   break;
+   case IP_VERSION(10, 1, 10):
+   case IP_VERSION(10, 1, 2):
+   case IP_VERSION(10, 1, 1):
+   firmware_supported = dev->gpu->mec_fw_version >= 144;
+   break;
+   case IP_VERSION(10, 3, 0):
+   case IP_VERSION(10, 3, 2):
+   case IP_VERSION(10, 3, 1):
+   case IP_VERSION(10, 3, 4):
+   case IP_VERSION(10, 3, 5):
+   firmware_supported = dev->gpu->mec_fw_version >= 89;
+   break;
+   case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 3, 3):
+   firmware_supported = false;
+   break;
+   default:
+   break;
+   }
+
+out:
+   if (firmware_supported)
+   dev->node_props.capability |= 
HSA_CAP_TRAP_DEBUG_FIRMWARE_SUPPORTED;
+}
+
+static void kfd_topology_set_capabilities(struct kfd_topology_device *dev)
+{
+   dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_2_0 <<
+   HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
+   HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
+
+   dev->node_props.capability |= HSA_CAP_TRAP_DEBUG_SUPPORT |
+   HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_TRAP_OVERRIDE_SUPPORTED |
+   HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED;
+
+   if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) {
+   dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
+
+   if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2))
+   dev->node_props.debug_prop |=
+   HSA_DBG_DISPATCH_INFO_ALWAYS_VALID;
+   else
+   dev->node_props.capability |=
+   
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED;
+   } else {
+   dev->node_props.deb

Re: [PATCH 06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

Implement the per-device calls to enable or disable HW debug mode for
GFX9 prior to GFX9.4.1.

GFX9.4.1 and onward will require their own enable/disable sequence as
follow on patches.

When hardware debug mode setting is requested, waves will inherit
these settings in the Shader Processor Input's (SPI) Sequencer Global
Block (SQG). This means that the KGD must drain all waves from the SPI
into SQG (approximately 96 SPI clock cycles) prior to debug mode setting
to ensure that the order of operations that the debugger expects with
regards to debug mode setting transaction requests and wave inheritence
of that mode is upheld.

Also ensure that exception overrides are reset to their original state
prior to debug enable or disable.

v2: remove unnecessary static srbm lock renaming.
add comments to explain ignored arguments for debug trap enable and
disable.

Signed-off-by: Jonathan Kim 
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 93 +++
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 9 ++
drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 3 +
3 files changed, 105 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c

index e92b93557c13..94a9fd9bd984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -646,6 +646,97 @@ int kgd_gfx_v9_wave_control_execute(struct 
amdgpu_device *adev,

return 0;
}
+/*
+ * GFX9 helper for wave launch stall requirements on debug trap setting.
+ *
+ * vmid:
+ * Target VMID to stall/unstall.
+ *
+ * stall:
+ * 0-unstall wave launch (enable), 1-stall wave launch (disable).
+ * After wavefront launch has been stalled, allocated waves must 
drain from

+ * SPI in order for debug trap settings to take effect on those waves.
+ * This is roughly a ~96 clock cycle wait on SPI where a read on
+ * SPI_GDBG_WAVE_CNTL translates to ~32 clock cycles.
+ * KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY indicates the number of 
reads required.

+ *
+ * NOTE: We can afford to clear the entire STALL_VMID field on unstall
+ * because GFX9.4.1 cannot support multi-process debugging due to trap
+ * configuration and masking being limited to global scope. Always assume
+ * single process conditions.
+
+ */
+#define KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY 3
+void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
+ uint32_t vmid,
+ bool stall)
+{
+ int i;
+ uint32_t data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+
+ if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1))
+ data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_VMID,
+ stall ? 1 << vmid : 0);
+ else
+ data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_RA,
+ stall ? 1 : 0);
+
+ WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL), data);
+
+ if (!stall)
+ return;
+
+ for (i = 0; i < KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY; i++)
+ RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+}
+
+/**


This was flagged by the kernel test robot. Should just be /* because 
it's not a formal doc comment.



+ * restore_dbg_reisters is ignored here but is a general interface 
requirement


Typo: reisters -> registers



+ * for devices that support GFXOFF and where the RLC save/restore list
+ * does not support hw registers for debugging i.e. the driver has to 
manually
+ * initialize the debug mode registers after it has disabled GFX off 
during the

+ * debug session.
+ */
+uint32_t kgd_gfx_v9_enable_debug_trap(struct amdgpu_device *adev,
+ bool restore_dbg_registers,
+ uint32_t vmid)
+{
+ mutex_lock(&adev->grbm_idx_mutex);
+
+ kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
+
+ WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+ kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
+
+ mutex_unlock(&adev->grbm_idx_mutex);
+
+ return 0;
+}
+
+/**


Same as above. With those fixed, the patch is

Reviewed-by: Felix Kuehling 


+ * keep_trap_enabled is ignored here but is a general interface 
requirement

+ * for devices that support multi-process debugging where the performance
+ * overhead from trap temporary setup needs to be bypassed when the debug
+ * session has ended.
+ */
+uint32_t kgd_gfx_v9_disable_debug_trap(struct amdgpu_device *adev,
+ bool keep_trap_enabled,
+ uint32_t vmid)
+{
+ mutex_lock(&adev->grbm_idx_mutex);
+
+ kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
+
+ WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+ kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
+
+ mutex_unlock(&adev->grbm_idx_mutex);
+
+ return 0;
+}
+
void kgd_gfx_v9_set_vm_context_page_table_base(struct amdgpu_device *adev,
uint32_t vmid, uint64_t page_table_base)
{
@@ -871,6 +962,8 @@ const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
.get_atc_vmid_pasid_mapping_info =
kgd_gfx_v9_get_atc_vmid_pasid_mapping_info,
.set_vm_context_page_table_base = 
kgd_gfx_v9_set_vm_context_page_table_base,

+ .enable_debug_trap =

Re: [PATCH 05/32] drm/amdgpu: setup hw debug registers on driver initialization

2023-02-16 Thread Felix Kuehling

On 2023-01-25 14:53, Jonathan Kim wrote:

Add missing debug trap registers references and initialize all debug
registers on boot by clearing the hardware exception overrides and the
wave allocation ID index.

The debugger requires that TTMPs 6 & 7 save the dispatch ID to map
waves onto dispatch during compute context inspection.
In order to correctly set this up, set the special reserved CP bit by
default whenever the MQD is initailized.

v2: leave TRAP_EN set for multi-process debugging as per process disable
will be taken care of in later patches.
fixup typo in description.
enable ttmp setup for dispatch boundary in mqd init for gfx11.
add trap on wave start and end registers for gfx11.

Signed-off-by: Jonathan Kim 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 30 
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  5 ++
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  |  5 ++
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  5 ++
  .../include/asic_reg/gc/gc_10_1_0_offset.h| 14 
  .../include/asic_reg/gc/gc_10_1_0_sh_mask.h   | 69 +++
  .../include/asic_reg/gc/gc_10_3_0_offset.h| 10 +++
  .../include/asic_reg/gc/gc_10_3_0_sh_mask.h   |  4 ++
  .../include/asic_reg/gc/gc_11_0_0_sh_mask.h   |  4 ++
  11 files changed, 173 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 6983acc456b2..a5faf23805b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4823,6 +4823,29 @@ static u32 
gfx_v10_0_init_pa_sc_tile_steering_override(struct amdgpu_device *ade
  
  #define DEFAULT_SH_MEM_BASES	(0x6000)
  
+static void gfx_v10_0_debug_trap_config_init(struct amdgpu_device *adev,

+   uint32_t first_vmid,
+   uint32_t last_vmid)
+{
+   uint32_t data;
+   uint32_t trap_config_vmid_mask = 0;
+   int i;
+
+   /* Calculate trap config vmid mask */
+   for (i = first_vmid; i < last_vmid; i++)
+   trap_config_vmid_mask |= (1 << i);
+
+   data = REG_SET_FIELD(0, SPI_GDBG_TRAP_CONFIG,
+   VMID_SEL, trap_config_vmid_mask);
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   TRAP_EN, 1);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_CONFIG), data);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA0), 0);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA1), 0);
+}
+
  static void gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
  {
int i;
@@ -4854,6 +4877,9 @@ static void gfx_v10_0_init_compute_vmid(struct 
amdgpu_device *adev)
WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, i, 0);
WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, i, 0);
}
+
+   gfx_v10_0_debug_trap_config_init(adev, adev->vm_manager.first_kfd_vmid,
+   AMDGPU_NUM_VMID);
  }
  
  static void gfx_v10_0_init_gds_vmid(struct amdgpu_device *adev)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index c621b2ad7ba3..3ca7a31fb770 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1572,6 +1572,7 @@ static void gfx_v11_0_init_compute_vmid(struct 
amdgpu_device *adev)
/* Enable trap for each kfd vmid. */
data = RREG32_SOC15(GC, 0, regSPI_GDBG_PER_VMID_CNTL);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   WREG32_SOC15(GC, 0, regSPI_GDBG_PER_VMID_CNTL, data);
}
soc21_grbm_select(adev, 0, 0, 0, 0);
mutex_unlock(&adev->srbm_mutex);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 8ad5c03506f2..222fe87161b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2289,6 +2289,29 @@ static void gfx_v9_0_setup_rb(struct amdgpu_device *adev)
adev->gfx.config.num_rbs = hweight32(active_rbs);
  }
  
+static void gfx_v9_0_debug_trap_config_init(struct amdgpu_device *adev,

+   uint32_t first_vmid,
+   uint32_t last_vmid)
+{
+   uint32_t data;
+   uint32_t trap_config_vmid_mask = 0;
+   int i;
+
+   /* Calculate trap config vmid mask */
+   for (i = first_vmid; i < last_vmid; i++)
+   trap_config_vmid_mask |= (1 << i);
+
+   data = REG_SET_FIELD(0, SPI_GDBG_TRAP_CONFIG,
+   VMID_SEL, trap_config_vmid_mask);
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   TRAP_EN, 1);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_CONFIG), d

Re: [PATCH 09/32] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

GFX9.4.2 now supports per-VMID debug mode controls registers
(SPI_GDBG_PER_VMID_CNTL).

Because the KFD lets the HWS handle PASID-VMID mapping, the KFD will
forward all debug mode setting register writes to the HWS scheduler
using a new MAP_PROCESS API, so instead of writing to registers, return
the required register values that the HWS needs to write on debug enable
and disable.

v2: add commentary on unused restore_dbg_registers for debug enable.

Signed-off-by: Jonathan Kim 
---
  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  | 43 ++-
  1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 4485bb29bec9..89868f9927ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -23,6 +23,44 @@
  #include "amdgpu_amdkfd.h"
  #include "amdgpu_amdkfd_arcturus.h"
  #include "amdgpu_amdkfd_gfx_v9.h"
+#include "gc/gc_9_4_2_offset.h"
+#include "gc/gc_9_4_2_sh_mask.h"
+
+/**


Use /* here.



+ * Returns TRAP_EN, EXCP_EN and EXCP_REPLACE.
+ *
+ * restore_dbg_reisters is ignored here but is a general interface requirement


Typo: registers



+ * for devices that support GFXOFF and where the RLC save/restore list
+ * does not support hw registers for debugging i.e. the driver has to manually
+ * initialize the debug mode registers after it has disabled GFX off during the
+ * debug session.
+ */
+static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
+
+/* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
+static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 
keep_trap_enabled);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
  
  const struct kfd2kgd_calls aldebaran_kfd2kgd = {

.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
@@ -41,6 +79,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.get_atc_vmid_pasid_mapping_info =
kgd_gfx_v9_get_atc_vmid_pasid_mapping_info,
.set_vm_context_page_table_base = 
kgd_gfx_v9_set_vm_context_page_table_base,
-   .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,


I think you're removing get_cu_occupancy accidentally here?

With those issues fixed, the patch is

Reviewed-by: Felix Kuehling 



-   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings
+   .enable_debug_trap = kgd_aldebaran_enable_debug_trap,
+   .disable_debug_trap = kgd_aldebaran_disable_debug_trap,
+   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
  };


Re: [PATCH 07/32] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

On GFX9.4.1, the implicit wait count instruction on s_barrier is
disabled by default in the driver during normal operation for
performance requirements.

There is a hardware bug in GFX9.4.1 where if the implicit wait count
instruction after an s_barrier instruction is disabled, any wave that
hits an exception may step over the s_barrier when returning from the
trap handler with the barrier logic having no ability to be
aware of this, thereby causing other waves to wait at the barrier
indefinitely resulting in a shader hang.  This bug has been corrected
for GFX9.4.2 and onward.

Since the debugger subscribes to hardware exceptions, in order to avoid
this bug, the debugger must enable implicit wait count on s_barrier
for a debug session and disable it on detach.

In order to change this setting in the in the device global SQ_CONFIG
register, the GFX pipeline must be idle.  GFX9.4.1 as a compute device
will either dispatch work through the compute ring buffers used for
image post processing or through the hardware scheduler by the KFD.

Have the KGD suspend and drain the compute ring buffer, then suspend the
hardware scheduler and block any future KFD process job requests before
changing the implicit wait count setting.  Once set, resume all work.

v2: remove flush on kfd suspend as that will be a general fix required
outside of this patch series.
comment on trap enable/disable ignored variables.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   | 118 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |   4 +-
  3 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 872450a3a164..3c03e34c194c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1041,6 +1041,9 @@ struct amdgpu_device {
struct pci_saved_state  *pci_state;
pci_channel_state_t pci_channel_state;
  
+	/* Track auto wait count on s_barrier settings */

+   boolbarrier_has_auto_waitcnt;
+
struct amdgpu_reset_control *reset_cntl;
uint32_t
ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c

index 4191af5a3f13..d5bb86ccd617 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -26,6 +26,7 @@
  #include "amdgpu.h"
  #include "amdgpu_amdkfd.h"
  #include "amdgpu_amdkfd_arcturus.h"
+#include "amdgpu_reset.h"
  #include "sdma0/sdma0_4_2_2_offset.h"
  #include "sdma0/sdma0_4_2_2_sh_mask.h"
  #include "sdma1/sdma1_4_2_2_offset.h"
@@ -48,6 +49,8 @@
  #include "amdgpu_amdkfd_gfx_v9.h"
  #include "gfxhub_v1_0.h"
  #include "mmhub_v9_4.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
  
  #define HQD_N_REGS 56

  #define DUMP_REG(addr) do {   \
@@ -276,6 +279,117 @@ int kgd_arcturus_hqd_sdma_destroy(struct amdgpu_device 
*adev, void *mqd,
return 0;
  }
  
+/*

+ * Helper used to suspend/resume gfx pipe for image post process work to set
+ * barrier behaviour.
+ */
+static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool 
suspend)
+{
+   int i, r = 0;
+
+   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+   struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
+
+   if (!(ring && ring->sched.thread))
+   continue;
+
+   /* stop secheduler and drain ring. */
+   if (suspend) {
+   drm_sched_stop(&ring->sched, NULL);
+   r = amdgpu_fence_wait_empty(ring);
+   if (r)
+   goto out;
+   } else {
+   drm_sched_start(&ring->sched, false);
+   }
+   }
+
+out:
+   /* return on resume or failure to drain rings. */
+   if (!suspend || r)
+   return r;
+
+   return amdgpu_device_ip_wait_for_idle(adev, GC_HWIP);
+}
+
+static void set_barrier_auto_waitcnt(struct amdgpu_device *adev, bool 
enable_waitcnt)
+{
+   uint32_t data;
+
+   WRITE_ONCE(adev->barrier_has_auto_waitcnt, enable_waitcnt);
+
+   if (!down_read_trylock(&adev->reset_domain->sem))
+   return;
+
+   amdgpu_amdkfd_suspend(adev, false);
+
+   if (suspend_resume_compute_scheduler(adev, true))
+   goto out;
+
+   data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CONFIG));
+   data = REG_SET_FIELD(data, SQ_CONFIG, DISABLE_BARRIER_WAITCNT,
+   enable_waitcnt ? 0 : 1);


This could be ..., !enable_waitcnt);



+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CONFI

Re: [PATCH 08/32] drm/amdgpu: add gfx10 hw debug mode enable and disable calls

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

Similar to GFX9 debug devices, set the hardware debug mode by draining
the SPI appropriately prior the mode setting request.

Because GFX10 has waves allocated by the work group boundaray and each


Typo: boundary?



SE's SPI instances do not communicate, the SPI drain time is much longer.
This long drain time will be fixed for GFX11 onwards.

Also remove a bunch of deprecated misplaced references for GFX10.3.

Signed-off-by: Jonathan Kim 
---
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  95 +++
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  28 
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  | 147 +-
  3 files changed, 126 insertions(+), 144 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 9378fc79e9ea..c09b45de02d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -708,6 +708,99 @@ static void set_vm_context_page_table_base(struct 
amdgpu_device *adev,
adev->gfxhub.funcs->setup_vm_pt_regs(adev, vmid, page_table_base);
  }
  
+/*

+ * GFX10 helper for wave launch stall requirements on debug trap setting.
+ *
+ * vmid:
+ *   Target VMID to stall/unstall.
+ *
+ * stall:
+ *   0-unstall wave launch (enable), 1-stall wave launch (disable).
+ *   After wavefront launch has been stalled, allocated waves must drain from
+ *   SPI in order for debug trap settings to take effect on those waves.
+ *   This is roughly a ~3500 clock cycle wait on SPI where a read on
+ *   SPI_GDBG_WAVE_CNTL translates to ~32 clock cycles.
+ *   KGD_GFX_V10_WAVE_LAUNCH_SPI_DRAIN_LATENCY indicates the number of reads 
required.
+ *
+ *   NOTE: We can afford to clear the entire STALL_VMID field on unstall
+ *   because current GFX10 chips cannot support multi-process debugging due to
+ *   trap configuration and masking being limited to global scope.  Always
+ *   assume single process conditions.
+ *
+ */
+
+#define KGD_GFX_V10_WAVE_LAUNCH_SPI_DRAIN_LATENCY  110
+static void kgd_gfx_v10_set_wave_launch_stall(struct amdgpu_device *adev, 
uint32_t vmid, bool stall)
+{
+   uint32_t data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+   int i;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_VMID,
+   stall ? 1 << vmid : 0);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL), data);
+
+   if (!stall)
+   return;
+
+   for (i = 0; i < KGD_GFX_V10_WAVE_LAUNCH_SPI_DRAIN_LATENCY; i++)
+   RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+}
+
+uint32_t kgd_gfx_v10_enable_debug_trap(struct amdgpu_device *adev,


The kernel test robot flagged a missing prototype or this function. You 
probably need to #include amdgpu_amdkfd_gfx_v10.h to fix this.




+   bool restore_dbg_registers,
+   uint32_t vmid)
+{
+
+   mutex_lock(&adev->grbm_idx_mutex);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, true);
+
+   /* assume gfx off is disabled for the debug session if rlc restore not 
supported. */
+   if (restore_dbg_registers) {
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   VMID_SEL, 1 << vmid);
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   TRAP_EN, 1);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_CONFIG), data);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA0), 0);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA1), 0);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(&adev->grbm_idx_mutex);
+
+   return 0;
+   }
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(&adev->grbm_idx_mutex);
+
+   return 0;
+}
+
+uint32_t kgd_gfx_v10_disable_debug_trap(struct amdgpu_device *adev,


Same as above.

With that fixed, the patch is

Reviewed-by: Felix Kuehling 



+   bool keep_trap_enabled,
+   uint32_t vmid)
+{
+   mutex_lock(&adev->grbm_idx_mutex);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, true);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(&adev->grbm_idx_mutex);
+
+   return 0;
+}
+
  static void program_trap_handler_settings(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr)
  {
@@ -750,5 +843,7 @@ const struct kfd

Re: [PATCH 10/32] drm/amdgpu: add gfx11 hw debug mode enable and disable calls

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

Implement the per-device calls to enable or disable HW debug mode
for GFX11.

Signed-off-by: Jonathan Kim 
---
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 39 +++
  1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
index 7e80caa05060..34aeff692eba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
@@ -30,6 +30,7 @@
  #include "soc15d.h"
  #include "v11_structs.h"
  #include "soc21.h"
+#include 


What is this needed for? Maybe for a later patch?


  
  enum hqd_dequeue_request_type {

NO_ACTION = 0,
@@ -606,6 +607,42 @@ static void set_vm_context_page_table_base_v11(struct 
amdgpu_device *adev,
adev->gfxhub.funcs->setup_vm_pt_regs(adev, vmid, page_table_base);
  }
  
+/**

Use /* here.

+ * Returns TRAP_EN, EXCP_EN and EXCP_REPLACE.
+ *
+ * restore_dbg_reisters is ignored here but is a general interface requirement


Typo: registers

With those fixed, the patch is

Reviewed-by: Felix Kuehling 



+ * for devices that support GFXOFF and where the RLC save/restore list
+ * does not support hw registers for debugging i.e. the driver has to manually
+ * initialize the debug mode registers after it has disabled GFX off during the
+ * debug session.
+ */
+static uint32_t kgd_gfx_v11_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
+
+/* Returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
+static uint32_t kgd_gfx_v11_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 
keep_trap_enabled);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
+
  const struct kfd2kgd_calls gfx_v11_kfd2kgd = {
.program_sh_mem_settings = program_sh_mem_settings_v11,
.set_pasid_vmid_mapping = set_pasid_vmid_mapping_v11,
@@ -622,4 +659,6 @@ const struct kfd2kgd_calls gfx_v11_kfd2kgd = {
.wave_control_execute = wave_control_execute_v11,
.get_atc_vmid_pasid_mapping_info = NULL,
.set_vm_context_page_table_base = set_vm_context_page_table_base_v11,
+   .enable_debug_trap = kgd_gfx_v11_enable_debug_trap,
+   .disable_debug_trap = kgd_gfx_v11_disable_debug_trap
  };


Re: [PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

The ROCm debugger will attach to a process to debug by PTRACE and will
expect the KFD to prepare a process for the target PID, whether the
target PID has opened the KFD device or not.

This patch is to explicity handle this requirement.  Further HW mode
setting and runtime coordination requirements will be handled in
following patches.

In the case where the target process has not opened the KFD device,
a new KFD process must be created for the target PID.
The debugger as well as the target process for this case will have not
acquired any VMs so handle process restoration to correctly account for
this.

To coordinate with HSA runtime, the debugger must be aware of the target
process' runtime enablement status and will copy the runtime status
information into the debugged KFD process for later query.

On enablement, the debugger will subscribe to a set of exceptions where
each exception events will notify the debugger through a pollable FIFO
file descriptor that the debugger provides to the KFD to manage.
Some events will be synchronously raised while other are scheduled,
which is why a debug_event_workarea worker is initialized.

Finally on process termination of either the debugger or the target,
debugging must be disabled if it has not been done so.

v3: fix typo on debug trap disable and PTRACE ATTACH relax check.
remove unnecessary queue eviction counter reset when there's nothing
to evict.
change err code to EALREADY if attaching to an already attached process.
move debug disable to release worker to avoid race with disable from
ioctl call.

v2: relax debug trap disable and PTRACE ATTACH requirement.

Signed-off-by: Jonathan Kim
---
  drivers/gpu/drm/amd/amdkfd/Makefile   |  3 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 88 -
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 94 +++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 33 +++
  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 -
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 34 ++-
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 63 +
  7 files changed, 308 insertions(+), 29 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.c
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index e758c2a24cd0..747754428073 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -55,7 +55,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v9.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
-   $(AMDKFD_PATH)/kfd_crat.o
+   $(AMDKFD_PATH)/kfd_crat.o \
+   $(AMDKFD_PATH)/kfd_debug.o
  
  ifneq ($(CONFIG_AMD_IOMMU_V2),)

  AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d3b019e64093..ee05c2e54ef6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -44,6 +44,7 @@
  #include "amdgpu_amdkfd.h"
  #include "kfd_smi_events.h"
  #include "amdgpu_dma_buf.h"
+#include "kfd_debug.h"
  
  static long kfd_ioctl(struct file *, unsigned int, unsigned long);

  static int kfd_open(struct inode *, struct file *);
@@ -142,10 +143,15 @@ static int kfd_open(struct inode *inode, struct file 
*filep)
return -EPERM;
}
  
-	process = kfd_create_process(filep);

+   process = kfd_create_process(current);
if (IS_ERR(process))
return PTR_ERR(process);
  
+	if (kfd_process_init_cwsr_apu(process, filep)) {

+   kfd_unref_process(process);
+   return -EFAULT;
+   }
+
if (kfd_is_locked()) {
dev_dbg(kfd_device, "kfd is locked!\n"
"process %d unreferenced", process->pasid);
@@ -2653,6 +2659,9 @@ static int kfd_ioctl_runtime_enable(struct file *filep, 
struct kfd_process *p, v
  static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process 
*p, void *data)
  {
struct kfd_ioctl_dbg_trap_args *args = data;
+   struct task_struct *thread = NULL;
+   struct pid *pid = NULL;
+   struct kfd_process *target = NULL;
int r = 0;
  
  	if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {

@@ -2660,9 +2669,71 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
return -EINVAL;
}
  
+	pid = find_get_pid(args->pid);

+   if (!pid) {
+   pr_debug("Cannot find pid info for %i\n", args->pid);
+   r = -ESRCH;
+   goto out;
+   }
+
+   thread = get_pid_task(pid, PIDTYPE_PID);
+
+   if (args->op == KFD_IOC_DBG_TRAP_ENABLE) {
+   bool create_process;
+
+ 

[PATCH]: add tmz support for GC 10.3.6

2023-02-16 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - General]


drm/amdgpu: add tmz support for GC 10.3.6



this patch to add tmz support for GC 10.3.6

Signed-off-by: Jesse Zhang jesse.zh...@amd.com



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c

index adfc7512c61b..6830f671cde7 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c

@@ -551,6 +551,7 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)

case IP_VERSION(10, 3, 2):

case IP_VERSION(10, 3, 4):

case IP_VERSION(10, 3, 5):

+   case IP_VERSION(10, 3, 6):

/* VANGOGH */

case IP_VERSION(10, 3, 1):



RE: [PATCH 1/2] drm/amdgpu: optimize VRAM allocation when using drm buddy

2023-02-16 Thread Xiao, Shane


> -Original Message-
> From: Paneer Selvam, Arunpravin 
> Sent: Thursday, February 16, 2023 3:24 PM
> To: Koenig, Christian ; Xiao, Shane
> ; Kuehling, Felix ;
> Christian König 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 1/2] drm/amdgpu: optimize VRAM allocation when using
> drm buddy
> 
> This patch seems to pass the rocm memory stress test case.
> Reviewed-by: Arunpravin Paneer Selvam
> 

Hi Christian,

Do you think we should upstream this patch?

Best Regards,
Shane


> 
> On 2/16/2023 12:39 PM, Christian König wrote:
> > Am 16.02.23 um 07:48 schrieb Xiao, Shane:
> >>> -Original Message-
> >>> From: Kuehling, Felix 
> >>> Sent: Thursday, February 16, 2023 6:19 AM
> >>> To: Christian König ; Xiao, Shane
> >>> ; Koenig, Christian
> ;
> >>> Paneer Selvam, Arunpravin 
> >>> Cc: amd-gfx@lists.freedesktop.org
> >>> Subject: Re: [PATCH 1/2] drm/amdgpu: optimize VRAM allocation when
> >>> using drm buddy
> >>>
> >>>
> >>> Am 2023-02-15 um 05:44 schrieb Christian König:
>  Am 15.02.23 um 03:51 schrieb Xiao, Shane:
> > For public review
> >> -Original Message-
> >> From: Koenig, Christian 
> >> Sent: Wednesday, February 15, 2023 3:02 AM
> >> To: Xiao, Shane ; Paneer Selvam,
> Arunpravin
> >> 
> >> Subject: Re: [PATCH 1/2] drm/amdgpu: optimize VRAM allocation
> >> when using drm buddy
> >>
> >> Am 14.02.23 um 15:53 schrieb Xiao, Shane:
>  -Original Message-
>  From: Koenig, Christian 
>  Sent: Tuesday, February 14, 2023 8:41 PM
>  To: Xiao, Shane ; brahma_sw_dev
>  
>  Cc: Paneer Selvam, Arunpravin
> >>> 
>  Subject: Re: [PATCH 1/2] drm/amdgpu: optimize VRAM allocation
> >>> when
>  using drm buddy
> 
>  Am 14.02.23 um 12:18 schrieb Shane Xiao:
> > Since the VRAM manager changed from drm mm to drm buddy.
> It's
> >>> not
> > necessary to allocate 2MB aligned VRAM for more than 2MB
> > unaligned size, and then do trim. This method improves the
> > allocation efficiency and reduces memory fragmentation.
>  Well that is a trade off.
> 
>  Allocating the BO as one contiguous chunk and then trimming is
>  beneficial because if we then later need it contiguous we don't
>  need to re-allocate and copy. This can be needed to display
>  something for
> >> example.
> > Hi Christian,
> >
> > This case means that you allocate BO that is unnecessary to be
> > continuous at first time, and latter the BO should be continuous.
> > I'm not familiar with display. Could you give me a few more
> > specific examples ?
>  On most generations DCE/DCN hardware needs the buffer contiguous
> to
> >>> be
>  able to scanout from it.
> 
>  Only newer APUs can use S/G to scanout from system memory pages.
> 
> >>> Yes, I agree that one contiguous chunk may get beneficial
> >>> sometimes.
> >>> But as far as I know, you cannot guarantee that
> >>> amdgpu_vram_mgr_new
> >> can get one contiguous chunk  if you don't set
> >> TTM_PL_FLAG_CONTIGUOUS flags.
> >>> For example, if you want to allocate 4M+4K BO, it will allocate
> >>> one 4M block
> >> + one 2M block which is unnecessary to be continuous, then 2M
> >> + block
> >> will be
> >> trimmed.
> >>
> >> Oh, that's indeed not something which should happen. Sounds more
> >> like a bug fix then.
> > Yes, I think this case should not be happened.
> > Actually, I'm not sure that why the allocated BO should be aligned
> > with pages_per_block, which is set to 2MB by default.
> > Does this help improve performance when allocating 2M or above BO?
> >   From my point of view, the TLB may be one of reason of this. But
> > I'm not sure about this.
>  Yes, we try to use allocations which are as contiguous as much as
>  possible for better TLB usage.
> 
>  Especially for some compute use cases this can make a >20%
>  performance difference.
> >>> We actually found that >2MB virtual address alignment was hurting
> >>> performance due to cache line aliasing. So we can't take advantage
> >>> of  >2MB pages in our page tables.
> >>>
> >>> Regards,
> >>>     Felix
> >> Yes, if we want to take advantage of 2M TLB usage, we should keep
> >> virtual address aligned.
> >>
> >> As you have mentioned that cache line aliasing issue, I'm confused
> >> about this.
> >> If 2MB aligned VA get the right PA from TLB or page table and the
> >> cache line addressing mode is not changed, the cache line aliasing
> >> issue should not happen here.
> >> Is there something wrong with my understanding? Or maybe there are
> >> some backgrounds that I didn't know.
> >
> > The problem is with virtual address alignments > 2MiB (or whatever the
> > big cache line size is).
> >
> > Let's assume an example where you have a lot of buffer each 66M

[PATCH 1/2] drm/amdgpu: add umc retire unit element

2023-02-16 Thread Tao Zhou
It records how many bad pages are retired in one uncorrectable error.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 3 +++
 4 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
index a6951160f13a..f2bf979af588 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
@@ -74,6 +74,8 @@ struct amdgpu_umc {
 
/* UMC regiser per channel offset */
uint32_t channel_offs;
+   /* how many pages are retired in one UE */
+   uint32_t retire_unit;
/* channel index table of interleaved memory */
const uint32_t *channel_idx_tbl;
struct ras_common_if *ras_if;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index fe2c15f598b8..c59c2332d191 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -696,6 +696,7 @@ static void gmc_v10_0_set_umc_funcs(struct amdgpu_device 
*adev)
adev->umc.channel_inst_num = UMC_V8_7_CHANNEL_INSTANCE_NUM;
adev->umc.umc_inst_num = UMC_V8_7_UMC_INSTANCE_NUM;
adev->umc.channel_offs = UMC_V8_7_PER_CHANNEL_OFFSET_SIENNA;
+   adev->umc.retire_unit = 1;
adev->umc.channel_idx_tbl = &umc_v8_7_channel_idx_tbl[0][0];
adev->umc.ras = &umc_v8_7_ras;
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 0a31a341aa43..85e0afc3d4f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -570,6 +570,7 @@ static void gmc_v11_0_set_umc_funcs(struct amdgpu_device 
*adev)
adev->umc.node_inst_num = adev->gmc.num_umc;
adev->umc.max_ras_err_cnt_per_query = 
UMC_V8_10_TOTAL_CHANNEL_NUM(adev);
adev->umc.channel_offs = UMC_V8_10_PER_CHANNEL_OFFSET;
+   adev->umc.retire_unit = UMC_V8_10_NA_COL_2BITS_POWER_OF_2_NUM;
if (adev->umc.node_inst_num == 4)
adev->umc.channel_idx_tbl = 
&umc_v8_10_channel_idx_tbl_ext0[0][0][0];
else
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index d65c6cea3445..b06170c00dfc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1288,6 +1288,7 @@ static void gmc_v9_0_set_umc_funcs(struct amdgpu_device 
*adev)
adev->umc.channel_inst_num = UMC_V6_1_CHANNEL_INSTANCE_NUM;
adev->umc.umc_inst_num = UMC_V6_1_UMC_INSTANCE_NUM;
adev->umc.channel_offs = UMC_V6_1_PER_CHANNEL_OFFSET_VG20;
+   adev->umc.retire_unit = 1;
adev->umc.channel_idx_tbl = &umc_v6_1_channel_idx_tbl[0][0];
adev->umc.ras = &umc_v6_1_ras;
break;
@@ -1296,6 +1297,7 @@ static void gmc_v9_0_set_umc_funcs(struct amdgpu_device 
*adev)
adev->umc.channel_inst_num = UMC_V6_1_CHANNEL_INSTANCE_NUM;
adev->umc.umc_inst_num = UMC_V6_1_UMC_INSTANCE_NUM;
adev->umc.channel_offs = UMC_V6_1_PER_CHANNEL_OFFSET_ARCT;
+   adev->umc.retire_unit = 1;
adev->umc.channel_idx_tbl = &umc_v6_1_channel_idx_tbl[0][0];
adev->umc.ras = &umc_v6_1_ras;
break;
@@ -1305,6 +1307,7 @@ static void gmc_v9_0_set_umc_funcs(struct amdgpu_device 
*adev)
adev->umc.channel_inst_num = UMC_V6_7_CHANNEL_INSTANCE_NUM;
adev->umc.umc_inst_num = UMC_V6_7_UMC_INSTANCE_NUM;
adev->umc.channel_offs = UMC_V6_7_PER_CHANNEL_OFFSET;
+   adev->umc.retire_unit = (UMC_V6_7_NA_MAP_PA_NUM * 2);
if (!adev->gmc.xgmi.connected_to_cpu)
adev->umc.ras = &umc_v6_7_ras;
if (1 & adev->smuio.funcs->get_die_id(adev))
-- 
2.35.1



[PATCH 2/2] drm/amdgpu: exclude duplicate pages from UMC RAS UE count

2023-02-16 Thread Tao Zhou
If a UMC bad page is reserved but not freed by an application, the
application may trigger uncorrectable error repeatly by accessing the page.

v2: add specific function to do the check.
v3: remove duplicate pages, calculate new added bad page number.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 23 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c |  2 ++
 3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 6e543558386d..777f85f3e5eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2115,6 +2115,29 @@ int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev)
return 0;
 }
 
+/* Remove duplicate pages, calculate new added bad page number.
+ * Note: the function should be called between amdgpu_ras_add_bad_pages
+ * and amdgpu_ras_save_bad_pages.
+ */
+int amdgpu_ras_umc_new_ue_count(struct amdgpu_device *adev)
+{
+   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+   struct ras_err_handler_data *data;
+   struct amdgpu_ras_eeprom_control *control;
+   int save_count;
+
+   if (!con || !con->eh_data)
+   return 0;
+
+   mutex_lock(&con->recovery_lock);
+   control = &con->eeprom_control;
+   data = con->eh_data;
+   save_count = data->count - control->ras_num_recs;
+   mutex_unlock(&con->recovery_lock);
+
+   return (save_count / adev->umc.retire_unit);
+}
+
 /*
  * read error record array in eeprom and reserve enough space for
  * storing new bad pages
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index f2ad93f6..e89c95438a88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -549,6 +549,8 @@ int amdgpu_ras_add_bad_pages(struct amdgpu_device *adev,
 
 int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev);
 
+int amdgpu_ras_umc_new_ue_count(struct amdgpu_device *adev);
+
 static inline enum ta_ras_block
 amdgpu_ras_block_to_ta(enum amdgpu_ras_block block) {
switch (block) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index 1c7fcb4f2380..45b6be7277dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -147,6 +147,8 @@ static int amdgpu_umc_do_page_retirement(struct 
amdgpu_device *adev,
err_data->err_addr_cnt) {
amdgpu_ras_add_bad_pages(adev, err_data->err_addr,
err_data->err_addr_cnt);
+   err_data->ue_count = amdgpu_ras_umc_new_ue_count(adev);
+
amdgpu_ras_save_bad_pages(adev);
 
amdgpu_dpm_send_hbm_bad_pages_num(adev, 
con->eeprom_control.ras_num_recs);
-- 
2.35.1



Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

2023-02-16 Thread Vasant Hegde
Matt,

Thanks a lot for testing and the dmesg log.

On 2/17/2023 12:29 AM, Matt Fagnani wrote:
> Vasant,
> 
> I applied your four patches to 6.2-rc8 and built that. The black screen, null
> pointer dereference, and warnings didn't happen when booting 6.2-rc8 with your
> patches. There were errors that the IOMMU wasn't restarted when amdgpu and
> amdkfd was starting though at kernel: kfd kfd: amdgpu: Failed to resume IOMMU
> for device 1002:9874. I don't know if those IOMMU errors were expected or not,

This patch is not for fixing PASID enablement issue. Its more of gracefully
handling the error path.

This means patch worked in expected way. i. e. It failed to enable PASID because
of original patch (commit 201007ef70), it didn't attach devices to new domain
and attach devices back to default domain.
It returned error to GPU saying we couldn't enable PASID/PRI. Hence we saw above
error message.

-Vasant


Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

2023-02-16 Thread Vasant Hegde
Hi Felix,


On 2/17/2023 1:29 AM, Felix Kuehling wrote:
>> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device
>> 1002:9874
>> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added due to 
>> errors 
> This look like IOMMU device initialization still fails (but more gracefully
> now). Vasant, is that expected?

My fix is to gracefully handle failure paths in IOMMU. So above logs are
expected. Basically it means IOMMU couldn't attach devices to new domain
(because it couldn't enable PASID on AMD GPU as ACS RR/UF flags are missing, see
commit 201007ef707 ) and we did fall back to old domain properly.

It also means that GPU will not be able to use PASID/PRI. If you need these
feauteres then you have to look into commit 201007ef707 and see how we can
enable PASID for GPU (without ACS UF/RR flag?).


> 
> This would lead to KFD not being available on Carrizo with this kernel, which 
> is
> probably not a big limitation in practice. It would only affect compute
> applications using the ROCm user mode stack. I don't think anyone does that
> these days on these old APUs.
> 
> The SMU errors seem unrelated to this unless there is some subtle interaction
> I'm missing.

I have no idea about GPU warning. All I can say is IOMMU side looks good but
PASID/PRI is not enabled for GPU.

-Vasant




Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.

2023-02-16 Thread Mikhail Gavrilov
On Fri, Dec 9, 2022 at 7:37 PM Leo Liu  wrote:
>
> Please try the latest AMDGPU driver:
>
> https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/
>

Sorry Leo, I miss your message.
This issue is still actual for 6.2-rc8.

In my first message I was mistaken.

> Before kernel 5.16 this only led to an artifact in the form of
> a green bar at the top of the screen, then starting from 5.17
> the GPU began to freeze.

The real behaviour before 5.18:
- vlc could plays video with small artifacts in the form of a green
bar on top of the video
- after playing video process vlc correctly exiting

On 5.18 this behaviour changed:
- vlc show black screen instead of playing video
- after playing the process not exiting
- if I tries kill vlc process with 'kill -9' vlc became zombi process
and many other processes start hangs (in kernel log appears follow
lines after 2 minutes)

INFO: task vlc:sh8:5248 blocked for more than 122 seconds.
  Tainted: GWL     ---  5.18.0-60.fc37.x86_64+debug #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:vlc:sh8 state:D stack:13616 pid: 5248 ppid:  1934 flags:0x4006
Call Trace:
 
 __schedule+0x492/0x1650
 ? _raw_spin_unlock_irqrestore+0x40/0x60
 ? debug_check_no_obj_freed+0x12d/0x250
 schedule+0x4e/0xb0
 schedule_timeout+0xe1/0x120
 ? lock_release+0x215/0x460
 ? trace_hardirqs_on+0x1a/0xf0
 ? _raw_spin_unlock_irqrestore+0x40/0x60
 dma_fence_default_wait+0x197/0x240
 ? __bpf_trace_dma_fence+0x10/0x10
 dma_fence_wait_timeout+0x229/0x260
 drm_sched_entity_fini+0x101/0x270 [gpu_sched]
 amdgpu_vm_fini+0x2b5/0x460 [amdgpu]
 ? idr_destroy+0x70/0xb0
 ? mutex_destroy+0x1e/0x50
 amdgpu_driver_postclose_kms+0x1ec/0x2c0 [amdgpu]
 drm_file_free.part.0+0x20d/0x260
 drm_release+0x6a/0x120
 __fput+0xab/0x270
 task_work_run+0x5c/0xa0
 do_exit+0x394/0xc40
 ? rcu_read_lock_sched_held+0x10/0x70
 do_group_exit+0x33/0xb0
 get_signal+0xbbc/0xbc0
 arch_do_signal_or_restart+0x30/0x770
 ? do_futex+0xfd/0x190
 ? __x64_sys_futex+0x63/0x190
 exit_to_user_mode_prepare+0x172/0x270
 syscall_exit_to_user_mode+0x16/0x50
 do_syscall_64+0x67/0x80
 ? do_syscall_64+0x67/0x80
 ? rcu_read_lock_sched_held+0x10/0x70
 ? trace_hardirqs_on_prepare+0x5e/0x110
 ? do_syscall_64+0x67/0x80
 ? rcu_read_lock_sched_held+0x10/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82c2364529
RSP: 002b:7f8210ff8c00 EFLAGS: 0246 ORIG_RAX: 00ca
RAX: fe00 RBX:  RCX: 7f82c2364529
RDX:  RSI: 0189 RDI: 7f823022542c
RBP: 7f8210ff8c30 R08:  R09: 
R10:  R11: 0246 R12: 
R13:  R14: 0001 R15: 7f823022542c
 
INFO: lockdep is turned off.

I bisected this issue and problematic commit is

❯ git bisect bad
5f3854f1f4e211f494018160b348a1c16e58013f is the first bad commit
commit 5f3854f1f4e211f494018160b348a1c16e58013f
Author: Alex Deucher 
Date:   Thu Mar 24 18:04:00 2022 -0400

drm/amdgpu: add more cases to noretry=1

Port current list from amd-staging-drm-next.

Signed-off-by: Alex Deucher 

 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 +++
 1 file changed, 3 insertions(+)

Unfortunately I couldn't simply revert this commit on 6.2-rc8 for
checking, because it leads to conflicts.

Alex, you as author of this commit could help me with it?


-- 
Best Regards,
Mike Gavrilov.


RE: [PATCH 2/2] drm/amdgpu: exclude duplicate pages from UMC RAS UE count

2023-02-16 Thread Yang, Stanley
[AMD Official Use Only - General]



> -Original Message-
> From: Zhou1, Tao 
> Sent: Friday, February 17, 2023 11:53 AM
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Yang, Stanley ; Chai,
> Thomas ; Li, Candice ; Lazar,
> Lijo 
> Cc: Zhou1, Tao 
> Subject: [PATCH 2/2] drm/amdgpu: exclude duplicate pages from UMC RAS
> UE count
> 
> If a UMC bad page is reserved but not freed by an application, the application
> may trigger uncorrectable error repeatly by accessing the page.
> 
> v2: add specific function to do the check.
> v3: remove duplicate pages, calculate new added bad page number.
> 
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 23
> +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |  2 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c |  2 ++
>  3 files changed, 27 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 6e543558386d..777f85f3e5eb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2115,6 +2115,29 @@ int amdgpu_ras_save_bad_pages(struct
> amdgpu_device *adev)
>   return 0;
>  }
> 
> +/* Remove duplicate pages, calculate new added bad page number.
> + * Note: the function should be called between
> amdgpu_ras_add_bad_pages
> + * and amdgpu_ras_save_bad_pages.
> + */
> +int amdgpu_ras_umc_new_ue_count(struct amdgpu_device *adev) {
> + struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> + struct ras_err_handler_data *data;
> + struct amdgpu_ras_eeprom_control *control;
> + int save_count;
> +
> + if (!con || !con->eh_data)
> + return 0;
> +
> + mutex_lock(&con->recovery_lock);
> + control = &con->eeprom_control;
> + data = con->eh_data;
> + save_count = data->count - control->ras_num_recs;
> + mutex_unlock(&con->recovery_lock);
> +
> + return (save_count / adev->umc.retire_unit); }

Stanley: It's better add comments about the return value.
Without above concern the patch is Reviewed-by: Stanley.Yang 


> +
>  /*
>   * read error record array in eeprom and reserve enough space for
>   * storing new bad pages
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> index f2ad93f6..e89c95438a88 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> @@ -549,6 +549,8 @@ int amdgpu_ras_add_bad_pages(struct
> amdgpu_device *adev,
> 
>  int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev);
> 
> +int amdgpu_ras_umc_new_ue_count(struct amdgpu_device *adev);
> +
>  static inline enum ta_ras_block
>  amdgpu_ras_block_to_ta(enum amdgpu_ras_block block) {
>   switch (block) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index 1c7fcb4f2380..45b6be7277dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -147,6 +147,8 @@ static int amdgpu_umc_do_page_retirement(struct
> amdgpu_device *adev,
>   err_data->err_addr_cnt) {
>   amdgpu_ras_add_bad_pages(adev, err_data-
> >err_addr,
>   err_data->err_addr_cnt);
> + err_data->ue_count =
> amdgpu_ras_umc_new_ue_count(adev);
> +
>   amdgpu_ras_save_bad_pages(adev);
> 
>   amdgpu_dpm_send_hbm_bad_pages_num(adev,
> con->eeprom_control.ras_num_recs);
> --
> 2.35.1