Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-23 Thread Christian König

Am 23.11.23 um 20:36 schrieb Felix Kuehling:

[+Alex]

On 2023-11-17 16:44, Felix Kuehling wrote:


This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.

These helper functions are needed for KFD to export and import DMABufs
the right way without duplicating the tracking of DMABufs associated 
with

GEM objects while ensuring that move notifier callbacks are working as
intended.

CC: Christian König 
CC: Thomas Zimmermann 
Signed-off-by: Felix Kuehling 


Re: our discussion about v2 of this patch: If this version is 
acceptable, can I get an R-b or A-b?


From my side feel free to add my Acked-by, I don't see how else you 
want to cleanly export DMA-bufs.


Regards,
Christian.



I would like to get this patch into drm-next as a prerequisite for 
patches 2 and 3. I cannot submit it to the current 
amd-staging-drm-next because the patch I'm reverting doesn't exist 
there yet.


Patch 2 and 3 could go into drm-next as well, or go through Alex's 
amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do 
you prefer to coordinate this?


Regards,
  Felix



---
  drivers/gpu/drm/drm_prime.c | 33 ++---
  include/drm/drm_prime.h |  7 +++
  2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 63b709a67471..834a5e28abbe 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf)
  }
  EXPORT_SYMBOL(drm_gem_dmabuf_release);
  -/*
+/**
   * drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers
   * @dev: drm_device to import into
   * @file_priv: drm file-private structure
@@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release);
   *
   * Returns 0 on success or a negative error code on failure.
   */
-static int drm_gem_prime_fd_to_handle(struct drm_device *dev,
-  struct drm_file *file_priv, int prime_fd,
-  uint32_t *handle)
+int drm_gem_prime_fd_to_handle(struct drm_device *dev,
+   struct drm_file *file_priv, int prime_fd,
+   uint32_t *handle)
  {
  struct dma_buf *dma_buf;
  struct drm_gem_object *obj;
@@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct 
drm_device *dev,

  dma_buf_put(dma_buf);
  return ret;
  }
+EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);
    int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
@@ -408,7 +409,7 @@ static struct dma_buf 
*export_and_register_object(struct drm_device *dev,

  return dmabuf;
  }
  -/*
+/**
   * drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers
   * @dev: dev to export the buffer from
   * @file_priv: drm file-private structure
@@ -421,10 +422,10 @@ static struct dma_buf 
*export_and_register_object(struct drm_device *dev,
   * The actual exporting from GEM object to a dma-buf is done 
through the

   * _gem_object_funcs.export callback.
   */
-static int drm_gem_prime_handle_to_fd(struct drm_device *dev,
-  struct drm_file *file_priv, uint32_t handle,
-  uint32_t flags,
-  int *prime_fd)
+int drm_gem_prime_handle_to_fd(struct drm_device *dev,
+   struct drm_file *file_priv, uint32_t handle,
+   uint32_t flags,
+   int *prime_fd)
  {
  struct drm_gem_object *obj;
  int ret = 0;
@@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct 
drm_device *dev,

    return ret;
  }
+EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);
    int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
@@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size);
   * @obj: GEM object to export
   * @flags: flags like DRM_CLOEXEC and DRM_RDWR
   *
- * This is the implementation of the _gem_object_funcs.export 
functions
- * for GEM drivers using the PRIME helpers. It is used as the 
default for

- * drivers that do not set their own.
+ * This is the implementation of the _gem_object_funcs.export 
functions for GEM drivers

+ * using the PRIME helpers. It is used as the default in
+ * drm_gem_prime_handle_to_fd().
   */
  struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
   int flags)
@@ -962,9 +964,10 @@ EXPORT_SYMBOL(drm_gem_prime_import_dev);
   * @dev: drm_device to import into
   * @dma_buf: dma-buf object to import
   *
- * This is the implementation of the gem_prime_import functions for GEM
- * drivers using the PRIME helpers. It is the default for drivers 
that do

- * not set their own _driver.gem_prime_import.
+ * This is the implementation of the gem_prime_import functions for 
GEM drivers

+ * using the PRIME helpers. Drivers can use this as their
+ * _driver.gem_prime_import implementation. It is used as the 
default

+ * implementation in 

[PATCH] drm/amdgpu: Update EEPROM I2C address for smu v13_0_0

2023-11-23 Thread Candice Li
Check smu v13_0_0 SKU type to select EEPROM I2C address.

Signed-off-by: Candice Li 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 65aa218380be1b..2fde93b00cab37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -214,6 +214,12 @@ static bool __get_eeprom_i2c_addr(struct amdgpu_device 
*adev,
control->i2c_address = EEPROM_I2C_MADDR_0;
return true;
case IP_VERSION(13, 0, 0):
+   if (strnstr(atom_ctx->vbios_pn, "D707",
+   sizeof(atom_ctx->vbios_pn)))
+   control->i2c_address = EEPROM_I2C_MADDR_0;
+   else
+   control->i2c_address = EEPROM_I2C_MADDR_4;
+   return true;
case IP_VERSION(13, 0, 6):
case IP_VERSION(13, 0, 10):
control->i2c_address = EEPROM_I2C_MADDR_4;
-- 
2.25.1



Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread James Zhu



On 2023-11-23 18:01, Felix Kuehling wrote:

On 2023-11-23 17:41, Greathouse, Joseph wrote:

[Public]


-Original Message-
From: Zhu, James 
Sent: Thursday, November 23, 2023 1:49 PM

On 2023-11-23 14:02, Felix Kuehling wrote:

On 2023-11-23 11:25, James Zhu wrote:

On 2023-11-22 17:35, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:

Add queue remapping to force the waves in any running
processes to complete a CWSR trap.

Please add an explanation why this is needed.

[JZ] Even though the profiling-enabled bits is turned off, the CWSR
trap handlers for some kernels with this process may still in running
stage, this will

force the waves in any running processes to complete a CWSR trap, and
make sure pc sampling is completely stopped with this process.   I
will add it later.

It may be confusing to talk specifically about "CWSR trap handler".
There is only one trap handler that is triggered by different events:
CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap
triggers, it serializes with any currently running trap handler in
that wavefront. So it seems that you're using CWSR as a way to ensure
that any host trap has completed: CWSR will wait for previous traps to
finish before trapping again for CWSR, the HWS firmware waits for CWSR
completion and the driver waits for HWS to finish CWSR with a fence on
a HIQ QUERY_STATUS packet. Is that correct?

[JZ] I think your explanation is more detail. Need Joseph to confirm.
Felix, your summary is correct. The reason we are trying to perform a 
queue unmap/map cycle as part of the PC sampling stop is to prevent 
the following:


1. A PC sampling request arrives to Wave X, sending it to 1st-level 
trap handler
2. User thread asks KFD to stop sampling for this process, which 
leads to kfd_pc_sample_stop()
3. kfd_pc_sample_stop() decrements the sampling refcent. If this is 
the last process to stop sampling, it stops any further sampling 
traps from being generated
4. kfd_pc_sample_stop() sets this process's TMA flag to false so 
waves in the 1st-level trap handler know sampling is disabled
 4.1. Wave X may be in 1st-level handler and not yet checked the 
TMA flag. If so, it will exit the 1st-level handler when it sees flag 
is false
 4.2. Wave X may have already passed the 1st-level TMA flag check 
and entered the 2nd-level trap handler to do the PC sample
5. kfd_pc_sample_stop() returns, eventually causing ioctl to return, 
back to user-space
6. Because the stop ioctl has returned, user-land deallocates 
user-space buffer the 2nd level trap handler uses to output sample data
7. Wave X that was in the 2nd-level handler tries to finish its 
sample output and writes to the now-freed location, causing a 
use-after-free


Note that Step 3 does not always stop further traps from arriving -- 
if another process still wants to do sampling, the driver or HW might 
still send traps to every wave on the device after Step 3.
As such, to avoid going into the 2nd-level handler for non-sampled 
processes, all 1st-level handlers must check their TMA flag to see if 
they should allow the sample to flow to the 2nd-level handler.


By removing the queue from the HW after Step 4, we can be sure that 
any existing waves from this process that entered the PC sampling 
2nd-level handler before Step 4 are done.
Any waves that were still in the 1st-level handler at Step 4.1 will 
be filtered by the TMA flag being set to false. CWSR will wait until 
they exit.
Any waves that were already in the 2nd-level handler (4.2) must 
complete before the CWSR save will complete and allow this queue 
removal request to complete.
Any waves that enter the 1st-level trap handler after Step 4 won't go 
into the PC sampling logic in the 2nd-level handler because the TMA 
flag is set to false. CWSR will wait until they exit.


When we then put the queue back on the hardware, any further traps 
that might show up (e.g. because another process is sampling) will 
get filtered by the TMA flag.


So once the queue removal (and thus CWSR save cycle) has completed, 
we can be sure that no other traps to this process will try to use 
its PC sample data buffer, so it's safe to return to user-space and 
let them potentially free that buffer.


I don't know how to summarize this nicely in a comment, but hopefully 
y'all can figure that out. :)


My best summary: We need to ensure that any waves executing the PC 
sampling part of the trap handler are done before kfd_pc_sample_stop 
returns, and that no new waves enter that part of the trap handler 
afterwards. This avoids race conditions that could lead to 
use-after-free. Unmapping and remapping the queues either waits for 
the waves to drain, or preempts them with CWSR, which itself executes 
a trap and waits for previous traps to finish.



[JZ]  Thanks all!



Regards,
  Felix




Thanks,
-Joe


Regards,
   Felix



Signed-off-by: James Zhu 
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11
+++

Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread Felix Kuehling

On 2023-11-23 17:41, Greathouse, Joseph wrote:

[Public]


-Original Message-
From: Zhu, James 
Sent: Thursday, November 23, 2023 1:49 PM

On 2023-11-23 14:02, Felix Kuehling wrote:

On 2023-11-23 11:25, James Zhu wrote:

On 2023-11-22 17:35, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:

Add queue remapping to force the waves in any running
processes to complete a CWSR trap.

Please add an explanation why this is needed.

[JZ] Even though the profiling-enabled bits is turned off, the CWSR
trap handlers for some kernels with this process may still in running
stage, this will

force the waves in any running processes to complete a CWSR trap, and
make sure pc sampling is completely stopped with this process.   I
will add it later.

It may be confusing to talk specifically about "CWSR trap handler".
There is only one trap handler that is triggered by different events:
CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap
triggers, it serializes with any currently running trap handler in
that wavefront. So it seems that you're using CWSR as a way to ensure
that any host trap has completed: CWSR will wait for previous traps to
finish before trapping again for CWSR, the HWS firmware waits for CWSR
completion and the driver waits for HWS to finish CWSR with a fence on
a HIQ QUERY_STATUS packet. Is that correct?

[JZ] I think your explanation is more detail. Need Joseph to confirm.

Felix, your summary is correct. The reason we are trying to perform a queue 
unmap/map cycle as part of the PC sampling stop is to prevent the following:

1. A PC sampling request arrives to Wave X, sending it to 1st-level trap handler
2. User thread asks KFD to stop sampling for this process, which leads to 
kfd_pc_sample_stop()
3. kfd_pc_sample_stop() decrements the sampling refcent. If this is the last 
process to stop sampling, it stops any further sampling traps from being 
generated
4. kfd_pc_sample_stop() sets this process's TMA flag to false so waves in the 
1st-level trap handler know sampling is disabled
 4.1. Wave X may be in 1st-level handler and not yet checked the TMA flag. 
If so, it will exit the 1st-level handler when it sees flag is false
 4.2. Wave X may have already passed the 1st-level TMA flag check and 
entered the 2nd-level trap handler to do the PC sample
5. kfd_pc_sample_stop() returns, eventually causing ioctl to return, back to 
user-space
6. Because the stop ioctl has returned, user-land deallocates user-space buffer 
the 2nd level trap handler uses to output sample data
7. Wave X that was in the 2nd-level handler tries to finish its sample output 
and writes to the now-freed location, causing a use-after-free

Note that Step 3 does not always stop further traps from arriving -- if another 
process still wants to do sampling, the driver or HW might still send traps to 
every wave on the device after Step 3.
As such, to avoid going into the 2nd-level handler for non-sampled processes, 
all 1st-level handlers must check their TMA flag to see if they should allow 
the sample to flow to the 2nd-level handler.

By removing the queue from the HW after Step 4, we can be sure that any 
existing waves from this process that entered the PC sampling 2nd-level handler 
before Step 4 are done.
Any waves that were still in the 1st-level handler at Step 4.1 will be filtered 
by the TMA flag being set to false. CWSR will wait until they exit.
Any waves that were already in the 2nd-level handler (4.2) must complete before 
the CWSR save will complete and allow this queue removal request to complete.
Any waves that enter the 1st-level trap handler after Step 4 won't go into the 
PC sampling logic in the 2nd-level handler because the TMA flag is set to 
false. CWSR will wait until they exit.

When we then put the queue back on the hardware, any further traps that might 
show up (e.g. because another process is sampling) will get filtered by the TMA 
flag.

So once the queue removal (and thus CWSR save cycle) has completed, we can be 
sure that no other traps to this process will try to use its PC sample data 
buffer, so it's safe to return to user-space and let them potentially free that 
buffer.

I don't know how to summarize this nicely in a comment, but hopefully y'all can 
figure that out. :)


My best summary: We need to ensure that any waves executing the PC 
sampling part of the trap handler are done before kfd_pc_sample_stop 
returns, and that no new waves enter that part of the trap handler 
afterwards. This avoids race conditions that could lead to 
use-after-free. Unmapping and remapping the queues either waits for the 
waves to drain, or preempts them with CWSR, which itself executes a trap 
and waits for previous traps to finish.


Regards,
  Felix




Thanks,
-Joe


Regards,
   Felix



Signed-off-by: James Zhu 
---
   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11
+++
   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
   

[PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-23 Thread Felix Kuehling
Make restore workers freezable so we don't have to explicitly flush them
in suspend and GPU reset code paths, and we don't accidentally try to
restore BOs while the GPU is suspended. Not having to flush restore_work
also helps avoid lock/fence dependencies in the GPU reset case where we're
not allowed to wait for fences.

A side effect of this is, that we can now have multiple concurrent threads
trying to signal the same eviction fence. Rework eviction fence signaling
and replacement to account for that.

The GPU reset path can no longer rely on restore_process_worker to resume
queues because evict/restore workers can run independently of it. Instead
call a new restore_process_helper directly.

This is an RFC and request for testing.

v2:
- Reworked eviction fence signaling
- Introduced restore_process_helper

v3:
- Handle unsignaled eviction fences in restore_process_bos

Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 68 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 87 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  4 +-
 3 files changed, 104 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2e302956a279..bdec88713a09 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1431,7 +1431,6 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void 
**process_info,
  amdgpu_amdkfd_restore_userptr_worker);
 
*process_info = info;
-   *ef = dma_fence_get(>eviction_fence->base);
}
 
vm->process_info = *process_info;
@@ -1462,6 +1461,8 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void 
**process_info,
list_add_tail(>vm_list_node,
&(vm->process_info->vm_list_head));
vm->process_info->n_vms++;
+
+   *ef = dma_fence_get(>process_info->eviction_fence->base);
mutex_unlock(>process_info->lock);
 
return 0;
@@ -1473,10 +1474,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void 
**process_info,
 reserve_pd_fail:
vm->process_info = NULL;
if (info) {
-   /* Two fence references: one in info and one in *ef */
dma_fence_put(>eviction_fence->base);
-   dma_fence_put(*ef);
-   *ef = NULL;
*process_info = NULL;
put_pid(info->pid);
 create_evict_fence_fail:
@@ -1670,7 +1668,8 @@ int amdgpu_amdkfd_criu_resume(void *p)
goto out_unlock;
}
WRITE_ONCE(pinfo->block_mmu_notifications, false);
-   schedule_delayed_work(>restore_userptr_work, 0);
+   queue_delayed_work(system_freezable_wq,
+  >restore_userptr_work, 0);
 
 out_unlock:
mutex_unlock(>lock);
@@ -2475,7 +2474,8 @@ int amdgpu_amdkfd_evict_userptr(struct 
mmu_interval_notifier *mni,
   KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
if (r)
pr_err("Failed to quiesce KFD\n");
-   schedule_delayed_work(_info->restore_userptr_work,
+   queue_delayed_work(system_freezable_wq,
+   _info->restore_userptr_work,
msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
}
mutex_unlock(_info->notifier_lock);
@@ -2810,7 +2810,8 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct 
work_struct *work)
 
/* If validation failed, reschedule another attempt */
if (evicted_bos) {
-   schedule_delayed_work(_info->restore_userptr_work,
+   queue_delayed_work(system_freezable_wq,
+   _info->restore_userptr_work,
msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
 
kfd_smi_event_queue_restore_rescheduled(mm);
@@ -2819,6 +2820,23 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct 
work_struct *work)
put_task_struct(usertask);
 }
 
+static void replace_eviction_fence(struct dma_fence **ef,
+  struct dma_fence *new_ef)
+{
+   struct dma_fence *old_ef = rcu_replace_pointer(*ef, new_ef, true
+   /* protected by process_info->lock */);
+
+   /* If we're replacing an unsignaled eviction fence, that fence will
+* never be signaled, and if anyone is still waiting on that fence,
+* they will hang forever. This should never happen. We should only
+* replace the fence in restore_work that only gets scheduled after
+* eviction work signaled the fence.
+*/
+   WARN_ONCE(!dma_fence_is_signaled(old_ef),
+ "Replacing unsignaled eviction fence");
+   dma_fence_put(old_ef);
+}
+
 /** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
  *   KFD process 

RE: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread Greathouse, Joseph
[Public]

> -Original Message-
> From: Zhu, James 
> Sent: Thursday, November 23, 2023 1:49 PM
>
> On 2023-11-23 14:02, Felix Kuehling wrote:
> > On 2023-11-23 11:25, James Zhu wrote:
> >>
> >> On 2023-11-22 17:35, Felix Kuehling wrote:
> >>>
> >>> On 2023-11-03 09:11, James Zhu wrote:
>  Add queue remapping to force the waves in any running
>  processes to complete a CWSR trap.
> >>>
> >>> Please add an explanation why this is needed.
> >>
> >> [JZ] Even though the profiling-enabled bits is turned off, the CWSR
> >> trap handlers for some kernels with this process may still in running
> >> stage, this will
> >>
> >> force the waves in any running processes to complete a CWSR trap, and
> >> make sure pc sampling is completely stopped with this process.   I
> >> will add it later.
> >
> > It may be confusing to talk specifically about "CWSR trap handler".
> > There is only one trap handler that is triggered by different events:
> > CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap
> > triggers, it serializes with any currently running trap handler in
> > that wavefront. So it seems that you're using CWSR as a way to ensure
> > that any host trap has completed: CWSR will wait for previous traps to
> > finish before trapping again for CWSR, the HWS firmware waits for CWSR
> > completion and the driver waits for HWS to finish CWSR with a fence on
> > a HIQ QUERY_STATUS packet. Is that correct?
> [JZ] I think your explanation is more detail. Need Joseph to confirm.

Felix, your summary is correct. The reason we are trying to perform a queue 
unmap/map cycle as part of the PC sampling stop is to prevent the following:

1. A PC sampling request arrives to Wave X, sending it to 1st-level trap handler
2. User thread asks KFD to stop sampling for this process, which leads to 
kfd_pc_sample_stop()
3. kfd_pc_sample_stop() decrements the sampling refcent. If this is the last 
process to stop sampling, it stops any further sampling traps from being 
generated
4. kfd_pc_sample_stop() sets this process's TMA flag to false so waves in the 
1st-level trap handler know sampling is disabled
4.1. Wave X may be in 1st-level handler and not yet checked the TMA flag. 
If so, it will exit the 1st-level handler when it sees flag is false
4.2. Wave X may have already passed the 1st-level TMA flag check and 
entered the 2nd-level trap handler to do the PC sample
5. kfd_pc_sample_stop() returns, eventually causing ioctl to return, back to 
user-space
6. Because the stop ioctl has returned, user-land deallocates user-space buffer 
the 2nd level trap handler uses to output sample data
7. Wave X that was in the 2nd-level handler tries to finish its sample output 
and writes to the now-freed location, causing a use-after-free

Note that Step 3 does not always stop further traps from arriving -- if another 
process still wants to do sampling, the driver or HW might still send traps to 
every wave on the device after Step 3.
As such, to avoid going into the 2nd-level handler for non-sampled processes, 
all 1st-level handlers must check their TMA flag to see if they should allow 
the sample to flow to the 2nd-level handler.

By removing the queue from the HW after Step 4, we can be sure that any 
existing waves from this process that entered the PC sampling 2nd-level handler 
before Step 4 are done.
Any waves that were still in the 1st-level handler at Step 4.1 will be filtered 
by the TMA flag being set to false. CWSR will wait until they exit.
Any waves that were already in the 2nd-level handler (4.2) must complete before 
the CWSR save will complete and allow this queue removal request to complete.
Any waves that enter the 1st-level trap handler after Step 4 won't go into the 
PC sampling logic in the 2nd-level handler because the TMA flag is set to 
false. CWSR will wait until they exit.

When we then put the queue back on the hardware, any further traps that might 
show up (e.g. because another process is sampling) will get filtered by the TMA 
flag.

So once the queue removal (and thus CWSR save cycle) has completed, we can be 
sure that no other traps to this process will try to use its PC sample data 
buffer, so it's safe to return to user-space and let them potentially free that 
buffer.

I don't know how to summarize this nicely in a comment, but hopefully y'all can 
figure that out. :)

Thanks,
-Joe

> >
> > Regards,
> >   Felix
> >
> >
> >>
> >>>
> >>>
> 
>  Signed-off-by: James Zhu 
>  ---
>    drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11
>  +++
>    drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
>    drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  3 +++
>    3 files changed, 19 insertions(+)
> 
>  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>  b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>  index c0e71543389a..a3f57be63f4f 100644
>  --- 

Re: [PATCH v3] drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit

2023-11-23 Thread Felix Kuehling



On 2023-11-23 01:20, ZhenGuo Yin wrote:

[Why]
Memory leaks of gang_ctx_bo and wptr_bo.

[How]
Free gang_ctx_bo and wptr_bo in pqm_uninit.

v2: add a common function pqm_clean_queue_resource to
free queue's resources.
v3: reset pdd->pqd.num_gws when destorying GWS queue.

Signed-off-by: ZhenGuo Yin 
---
  .../amd/amdkfd/kfd_process_queue_manager.c| 54 +++
  1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index ebaec476f49a..fb5840a5df06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -169,16 +169,43 @@ int pqm_init(struct process_queue_manager *pqm, struct 
kfd_process *p)
return 0;
  }
  
+static void pqm_clean_queue_resource(struct process_queue_manager *pqm,

+struct process_queue_node *pqn)
+{
+   struct kfd_node *dev;
+   struct kfd_process_device *pdd;
+
+   dev = pqn->q->device;
+
+   pdd = kfd_get_process_device_data(dev, pqm->process);
+   if (!pdd) {
+   pr_err("Process device data doesn't exist\n");
+   return;
+   }
+
+   if (pqn->q->gws) {
+   if (KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3) &&
+   !dev->kfd->shared_resources.enable_mes)
+   amdgpu_amdkfd_remove_gws_from_process(
+   pqm->process->kgd_process_info, pqn->q->gws);
+   pdd->qpd.num_gws = 0;


Wrong indentation. I almost didn't see this at all. It should be 
indented at the same level as the if-statement.


+   if (KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3) &&
+   !dev->kfd->shared_resources.enable_mes)
+   amdgpu_amdkfd_remove_gws_from_process(
+   pqm->process->kgd_process_info, pqn->q->gws);
+   pdd->qpd.num_gws = 0;

With that fixed, the patch is

Reviewed-by: Felix Kuehling 



+   }
+
+   if (dev->kfd->shared_resources.enable_mes) {
+   amdgpu_amdkfd_free_gtt_mem(dev->adev, pqn->q->gang_ctx_bo);
+   if (pqn->q->wptr_bo)
+   amdgpu_amdkfd_free_gtt_mem(dev->adev, pqn->q->wptr_bo);
+   }
+}
+
  void pqm_uninit(struct process_queue_manager *pqm)
  {
struct process_queue_node *pqn, *next;
  
  	list_for_each_entry_safe(pqn, next, >queues, process_queue_list) {

-   if (pqn->q && pqn->q->gws &&
-   KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3) &&
-   !pqn->q->device->kfd->shared_resources.enable_mes)
-   
amdgpu_amdkfd_remove_gws_from_process(pqm->process->kgd_process_info,
-   pqn->q->gws);
+   if (pqn->q)
+   pqm_clean_queue_resource(pqm, pqn);
+
kfd_procfs_del_queue(pqn->q);
uninit_queue(pqn->q);
list_del(>process_queue_list);
@@ -465,22 +492,7 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, 
unsigned int qid)
goto err_destroy_queue;
}
  
-		if (pqn->q->gws) {

-   if (KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3) 
&&
-   !dev->kfd->shared_resources.enable_mes)
-   amdgpu_amdkfd_remove_gws_from_process(
-   pqm->process->kgd_process_info,
-   pqn->q->gws);
-   pdd->qpd.num_gws = 0;
-   }
-
-   if (dev->kfd->shared_resources.enable_mes) {
-   amdgpu_amdkfd_free_gtt_mem(dev->adev,
-  pqn->q->gang_ctx_bo);
-   if (pqn->q->wptr_bo)
-   amdgpu_amdkfd_free_gtt_mem(dev->adev, 
pqn->q->wptr_bo);
-
-   }
+   pqm_clean_queue_resource(pqm, pqn);
uninit_queue(pqn->q);
}
  


Re: [PATCH 06/24] drm/amdkfd: add trace_id return

2023-11-23 Thread Zhu, James
[AMD Official Use Only - General]



On 2023-11-22 17:21, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:
Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  6 ++
  3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
  static void kfd_pc_sampling_init(struct kfd_node *dev)
  {
  mutex_init(>pcs_data.mutex);
+idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
  }
static void kfd_pc_sampling_exit(struct kfd_node *dev)
  {
+idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr);
  mutex_destroy(>pcs_data.mutex);
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index f0d910ee730c..4c9fc48e1a6a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
  {
  struct kfd_pc_sample_info *supported_format = NULL;
  struct kfd_pc_sample_info user_info;
+struct pc_sampling_entry *pcs_entry;
  int ret;
  int i;
  @@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct 
kfd_process_device *pdd,
  return ret ? ret : -EEXIST;
  }
  -/* TODO: add trace_id return */
+pcs_entry = kvzalloc(sizeof(*pcs_entry), GFP_KERNEL);
+if (!pcs_entry) {
+mutex_unlock(>dev->pcs_data.mutex);
+return -ENOMEM;
+}
+
+i = 
idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+pcs_entry, 1, 0, GFP_KERNEL);
+if (i < 0) {
+mutex_unlock(>dev->pcs_data.mutex);
+kvfree(pcs_entry);
+return i;
+}
if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
  memcpy(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
@@ -149,6 +162,11 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
  pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
  mutex_unlock(>dev->pcs_data.mutex);
  +pcs_entry->pdd = pdd;

One more thought: You have a bunch of pcs_entries pointing to pdd. What 
mechanism guarantees, that the pcs_entries are destroyed before the pdd on 
process termination? I think kfd_pc_sampling_exit doesn't run during process 
termination, because it deals with per-device data structures, not per-process 
data structures.

Should the IDR be in the PDD rather than the device? In that case you wouldn't 
even need the pdd pointer in struct pcs_entry.
[JZ] the IDR here is mainly for generating trace_id with this device. I am not 
sure if ROCr/ROCprofiler are fine with this change which means same process has 
same trace_id value for different nodes. @Yat Sin, 
David would you mind give your comments here?

I see you have a patch much later in the series "drm/amdkfd: add pc sampling 
release when process release". I'd prefer this squashed here and in the patches 
that add the start/stop functions.

Regards,
  Felix


+user_args->trace_id = (uint32_t)i;
+
+pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", pcs_entry, 
i, pdd->dev->id);
+
  return 0;
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 81c925fb2952..642558026d16 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,7 @@ struct kfd_dev;
struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };
  @@ -743,6 +744,11 @@ enum kfd_pdd_bound {
   */
  #define SDMA_ACTIVITY_DIVISOR  100
  +struct pc_sampling_entry {
+bool enabled;
+struct kfd_process_device *pdd;
+};
+
  /* Data that is per-process-per device. */
  struct kfd_process_device {
  /* The device that owns this data. */


Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid

2023-11-23 Thread James Zhu



On 2023-11-23 15:32, Felix Kuehling wrote:


On 2023-11-23 15:18, James Zhu wrote:


On 2023-11-22 17:15, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Check pcs_enrty valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 
++--

  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 4c9fc48e1a6a..36366c8847de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -179,6 +179,21 @@ static int kfd_pc_sample_destroy(struct 
kfd_process_device *pdd, uint32_t trace_

  int kfd_pc_sample(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *args)
  {
+    struct pc_sampling_entry *pcs_entry;
+
+    if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+    args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+    mutex_lock(>dev->pcs_data.mutex);
+    pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,

+    args->trace_id);
+    mutex_unlock(>dev->pcs_data.mutex);


You need to keep holding the lock while the pcs_entry is still used. 
That includes any of the kfd_pc_sample_ functions below. 
Otherwise someone could free it concurrently. It would also simplify 
the ..._ functions, if they didn't have to worry about the 
locking themselves.
[JZ] pcs_entry is only for this pc sampling process, which has 
kfd_process->mutex protected here.


OK. That's not obvious. I'm also wary about depending too much on the 
big process lock. We will need to make that locking more granular 
soon, because it is causing performance issues with multi-threaded 
processes.

[Jz] Let me add some comments on pcs_entry.


Regards,
  Felix




Regards,
  Felix



+
+    if (!pcs_entry ||
+    pcs_entry->pdd != pdd)
+    return -EINVAL;
+    }
+
  switch (args->op) {
  case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
  return kfd_pc_sample_query_cap(pdd, args);
@@ -187,13 +202,22 @@ int kfd_pc_sample(struct kfd_process_device 
*pdd,

  return kfd_pc_sample_create(pdd, args);
    case KFD_IOCTL_PCS_OP_DESTROY:
-    return kfd_pc_sample_destroy(pdd, args->trace_id);
+    if (pcs_entry->enabled)
+    return -EBUSY;
+    else
+    return kfd_pc_sample_destroy(pdd, args->trace_id);
    case KFD_IOCTL_PCS_OP_START:
-    return kfd_pc_sample_start(pdd);
+    if (pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_start(pdd);
    case KFD_IOCTL_PCS_OP_STOP:
-    return kfd_pc_sample_stop(pdd);
+    if (!pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_stop(pdd);
  }
    return -EINVAL;


Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start

2023-11-23 Thread James Zhu



On 2023-11-23 15:21, Felix Kuehling wrote:


On 2023-11-23 15:01, James Zhu wrote:


On 2023-11-22 17:27, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable pc sampling start.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 
+---

  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
  2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 60b29b245db5..33d003ca0093 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -83,9 +83,29 @@ static int kfd_pc_sample_query_cap(struct 
kfd_process_device *pdd,

  return 0;
  }
  -static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+    struct pc_sampling_entry *pcs_entry)
  {
-    return -EINVAL;
+    bool pc_sampling_start = false;
+
+    pcs_entry->enabled = true;
+    mutex_lock(>dev->pcs_data.mutex);
+    if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+    pc_sampling_start = true;
+ pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+    mutex_unlock(>dev->pcs_data.mutex);
+
+    while (pc_sampling_start) {
+    if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {

+    usleep_range(1000, 2000);


I don't understand why you need this synchronization through 
stop_enable. Why can't you do both the start and stop while holding 
the mutex? It's just setting a flag in the TMA, so it's not a 
time-consuming operation, and I don't see any potential for deadlocks.
[JZ] for stop, not just set TMA. need wait for current pc sampling 
completely stop and reset some initial setting.


I think that's being obfuscated by how you split up this patch series. 
Maybe if you squash the queue remapping patch into this one, it would 
be more obvious what's really happening when you stop sampling and 
would make it easier to review the synchronization and locking strategy.

[JZ] Sure


Regards,
  Felix




Regards,
  Felix



+    } else {
+ kfd_process_set_trap_pc_sampling_flag(>qpd,
+ pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+    break;
+    }
+    }
+
+    return 0;
  }
    static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
@@ -225,7 +245,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  if (pcs_entry->enabled)
  return -EALREADY;
  else
-    return kfd_pc_sample_start(pdd);
+    return kfd_pc_sample_start(pdd, pcs_entry);
    case KFD_IOCTL_PCS_OP_STOP:
  if (!pcs_entry->enabled)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 6670534f47b8..613910e0d440 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,8 @@ struct kfd_dev;
    struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+    uint32_t active_count;  /* Num of active sessions */
+    bool stop_enable;   /* pc sampling stop in process */
  struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };


Re: [PATCH] drm/amdgpu: Enable event log on MES 11

2023-11-23 Thread Felix Kuehling

On 2023-11-23 14:55, shaoyunl wrote:

Enable event log through the HW specific FW API

Signed-off-by: shaoyunl 


I'm assuming that enabling the log unconditionally has no noticeable 
performance impact. In that case, the patch is


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 2 ++
  drivers/gpu/drm/amd/include/mes_v11_api_def.h | 1 +
  2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 4dfec56e1b7f..26d71a22395d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -408,6 +408,8 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
mes_set_hw_res_pkt.enable_level_process_quantum_check = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;
+   mes_set_hw_res_pkt.enable_mes_event_int_logging = 1;
+   mes_set_hw_res_pkt.event_intr_history_gpu_mc_ptr = 
mes->event_log_gpu_addr;
  
  	return mes_v11_0_submit_pkt_and_poll_completion(mes,

_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt),
diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
index b1db2b190187..1fbfd1aa987e 100644
--- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
@@ -232,6 +232,7 @@ union MESAPI_SET_HW_RESOURCES {
};
uint32_toversubscription_timer;
uint64_tdoorbell_info;
+   uint64_tevent_intr_history_gpu_mc_ptr;
};
  
  	uint32_t	max_dwords_in_api[API_FRAME_SIZE_IN_DWORDS];


Re: [PATCH v5 00/20] remove I2C_CLASS_DDC support

2023-11-23 Thread Wolfram Sang
On Thu, Nov 23, 2023 at 10:40:20AM +0100, Heiner Kallweit wrote:
> After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
> olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
> Class-based device auto-detection is a legacy mechanism and shouldn't
> be used in new code. So we can remove this class completely now.
> 
> Preferably this series should be applied via the i2c tree.
> 
> v2:
> - change tag in commit subject of patch 03
> - add ack tags
> v3:
> - fix a compile error in patch 5
> v4:
> - more ack and review tags
> v5:
> - more acks
> 
> Signed-off-by: Heiner Kallweit 

I created an immutable branch for this which the buildbots will
hopefully check over night. I will reply with comments tomorrow when I
got the buildbot results.



signature.asc
Description: PGP signature


Re: [PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread Felix Kuehling



On 2023-11-23 16:29, Felix Kuehling wrote:

On 2023-11-23 14:48, shaoyunl wrote:
This is the generic SW part, prepare the event log buffer and dump it 
through debugfs


Signed-off-by: shaoyunl 


Reviewed-by: Felix Kuehling 


Sorry, I just realized a potential problem, see inline.






---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 61 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  5 ++
  4 files changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index a53f436fa9f1..8b2cbeae99ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2140,6 +2140,8 @@ int amdgpu_debugfs_init(struct amdgpu_device 
*adev)

  amdgpu_debugfs_firmware_init(adev);
  amdgpu_ta_if_debugfs_init(adev);
  +    amdgpu_debugfs_mes_event_log_init(adev);


This always gets initialized, even if the GPU isn't using MES. But the 
log buffer only gets allocated on GPUs that have MES. I think reading 
the log in debugfs on a GPU without MES would cause a kernel oops. You 
either need to add a check for that in ..._event_log_show, or skip the 
debugfs file creation in ..._event_log_init if the GPU doesn't use MES.


Regards,
  Felix



+
  #if defined(CONFIG_DRM_AMD_DC)
  if (adev->dc_enabled)
  dtn_debugfs_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h

index 371a6f0deb29..0425432d8659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
@@ -32,3 +32,5 @@ void amdgpu_debugfs_fini(struct amdgpu_device *adev);
  void amdgpu_debugfs_fence_init(struct amdgpu_device *adev);
  void amdgpu_debugfs_firmware_init(struct amdgpu_device *adev);
  void amdgpu_debugfs_gem_init(struct amdgpu_device *adev);
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev);
+
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c

index 45280fb0e00c..b4ba556dc733 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -97,6 +97,26 @@ static int amdgpu_mes_doorbell_init(struct 
amdgpu_device *adev)

  return 0;
  }
  +static int amdgpu_mes_event_log_init(struct amdgpu_device *adev)
+{
+    int r;
+
+    r = amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
+    AMDGPU_GEM_DOMAIN_GTT,
+    >mes.event_log_gpu_obj,
+    >mes.event_log_gpu_addr,
+    >mes.event_log_cpu_addr);
+    if (r) {
+    dev_warn(adev->dev, "failed to create MES event log buffer 
(%d)", r);

+    return r;
+    }
+
+    memset(adev->mes.event_log_cpu_addr, 0, PAGE_SIZE);
+
+    return  0;
+
+}
+
  static void amdgpu_mes_doorbell_free(struct amdgpu_device *adev)
  {
  bitmap_free(adev->mes.doorbell_bitmap);
@@ -181,8 +201,14 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
  if (r)
  goto error;
  +    r = amdgpu_mes_event_log_init(adev);
+    if (r)
+    goto error_doorbell;
+
  return 0;
  +error_doorbell:
+    amdgpu_mes_doorbell_free(adev);
  error:
  amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
  amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
@@ -198,6 +224,10 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
    void amdgpu_mes_fini(struct amdgpu_device *adev)
  {
+    amdgpu_bo_free_kernel(>mes.event_log_gpu_obj,
+  >mes.event_log_gpu_addr,
+  >mes.event_log_cpu_addr);
+
  amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
  amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
  amdgpu_device_wb_free(adev, adev->mes.read_val_offs);
@@ -1483,3 +1513,34 @@ int amdgpu_mes_init_microcode(struct 
amdgpu_device *adev, int pipe)

  amdgpu_ucode_release(>mes.fw[pipe]);
  return r;
  }
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, 
void *unused)

+{
+    struct amdgpu_device *adev = m->private;
+    uint32_t *mem = (uint32_t *)(adev->mes.event_log_cpu_addr);
+
+    seq_hex_dump(m, "", DUMP_PREFIX_OFFSET, 32, 4,
+ mem, PAGE_SIZE, false);
+
+    return 0;
+}
+
+
+DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_mes_event_log);
+
+#endif
+
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev)
+{
+
+#if defined(CONFIG_DEBUG_FS)
+    struct drm_minor *minor = adev_to_drm(adev)->primary;
+    struct dentry *root = minor->debugfs_root;
+
+    debugfs_create_file("amdgpu_mes_event_log", 0444, root,
+    adev, _debugfs_mes_event_log_fops);
+
+#endif
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

index a27b424ffe00..894b9b133000 100644
--- 

Re: [PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread Felix Kuehling

On 2023-11-23 14:48, shaoyunl wrote:

This is the generic SW part, prepare the event log buffer and dump it through 
debugfs

Signed-off-by: shaoyunl 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 61 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  5 ++
  4 files changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a53f436fa9f1..8b2cbeae99ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2140,6 +2140,8 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
amdgpu_debugfs_firmware_init(adev);
amdgpu_ta_if_debugfs_init(adev);
  
+	amdgpu_debugfs_mes_event_log_init(adev);

+
  #if defined(CONFIG_DRM_AMD_DC)
if (adev->dc_enabled)
dtn_debugfs_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
index 371a6f0deb29..0425432d8659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
@@ -32,3 +32,5 @@ void amdgpu_debugfs_fini(struct amdgpu_device *adev);
  void amdgpu_debugfs_fence_init(struct amdgpu_device *adev);
  void amdgpu_debugfs_firmware_init(struct amdgpu_device *adev);
  void amdgpu_debugfs_gem_init(struct amdgpu_device *adev);
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev);
+
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 45280fb0e00c..b4ba556dc733 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -97,6 +97,26 @@ static int amdgpu_mes_doorbell_init(struct amdgpu_device 
*adev)
return 0;
  }
  
+static int amdgpu_mes_event_log_init(struct amdgpu_device *adev)

+{
+   int r;
+
+   r = amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_GTT,
+   >mes.event_log_gpu_obj,
+   >mes.event_log_gpu_addr,
+   >mes.event_log_cpu_addr);
+   if (r) {
+   dev_warn(adev->dev, "failed to create MES event log buffer 
(%d)", r);
+   return r;
+   }
+
+   memset(adev->mes.event_log_cpu_addr, 0, PAGE_SIZE);
+
+   return  0;
+
+}
+
  static void amdgpu_mes_doorbell_free(struct amdgpu_device *adev)
  {
bitmap_free(adev->mes.doorbell_bitmap);
@@ -181,8 +201,14 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
if (r)
goto error;
  
+	r = amdgpu_mes_event_log_init(adev);

+   if (r)
+   goto error_doorbell;
+
return 0;
  
+error_doorbell:

+   amdgpu_mes_doorbell_free(adev);
  error:
amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
@@ -198,6 +224,10 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
  
  void amdgpu_mes_fini(struct amdgpu_device *adev)

  {
+   amdgpu_bo_free_kernel(>mes.event_log_gpu_obj,
+ >mes.event_log_gpu_addr,
+ >mes.event_log_cpu_addr);
+
amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
amdgpu_device_wb_free(adev, adev->mes.read_val_offs);
@@ -1483,3 +1513,34 @@ int amdgpu_mes_init_microcode(struct amdgpu_device 
*adev, int pipe)
amdgpu_ucode_release(>mes.fw[pipe]);
return r;
  }
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, void *unused)
+{
+   struct amdgpu_device *adev = m->private;
+   uint32_t *mem = (uint32_t *)(adev->mes.event_log_cpu_addr);
+
+   seq_hex_dump(m, "", DUMP_PREFIX_OFFSET, 32, 4,
+mem, PAGE_SIZE, false);
+
+   return 0;
+}
+
+
+DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_mes_event_log);
+
+#endif
+
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev)
+{
+
+#if defined(CONFIG_DEBUG_FS)
+   struct drm_minor *minor = adev_to_drm(adev)->primary;
+   struct dentry *root = minor->debugfs_root;
+
+   debugfs_create_file("amdgpu_mes_event_log", 0444, root,
+   adev, _debugfs_mes_event_log_fops);
+
+#endif
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index a27b424ffe00..894b9b133000 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -133,6 +133,11 @@ struct amdgpu_mes {
uint32_tnum_mes_dbs;
unsigned long   *doorbell_bitmap;
  
+	/* MES event log buffer */

+   struct amdgpu_bo

Re: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-23 Thread Felix Kuehling



On 2023-11-10 07:07, Pan, Xinhui wrote:

[AMD Official Use Only - General]

Wait, I think we need a small fix below.

--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -2036,6 +2036,7 @@ int kfd_resume_all_processes(void)
 int ret = 0, idx = srcu_read_lock(_processes_srcu);

 hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+   cancel_delayed_work_sync(>restore_work);


I think this would also be problematic and not race free. Restore work 
could be scheduled again between cancel and restore_process_helper. A 
better solution would be to deal with the situation that I already have 
an unsignaled eviction fence. I'm looking into this right now. I hope to 
have an updated patch later today.


Regards,
  Felix



 if (restore_process_helper(p)) {
 pr_err("Restore process %d failed during resume\n",
p->pasid);

Felix,
   restore_process_helper is called both in resume and restore-work. Which 
calls into amdgpu_amdkfd_gpuvm_restore_process_bos to create a new ef.
So there is one race below.
resume create a new ef and soon the restore-work which is freezed during 
suspend create another new ef.
Then there is one warning when you call replace_eviction_fence.

[   83.865870] Replacing unsignaled eviction fence
[   83.872452] WARNING: CPU: 5 PID: 9 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2838 
amdgpu_amdkfd_gpuvm_restore_pro
cess_bos+0xa9e/0xfe0 [amdgpu]
[snip]
[   83.896776] Workqueue: kfd_restore_wq restore_process_worker [amdgpu]
[   83.989171] e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, 
Flow Control: Rx/Tx
[   84.004699]
[   84.004701] RIP: 0010:amdgpu_amdkfd_gpuvm_restore_process_bos+0xa9e/0xfe0 
[amdgpu]
[   84.046060] Code: 48 83 05 8c aa ea 00 01 44 89 8d 08 fe ff ff 48 89 95 18 
fe ff ff c6 05 3a 82 d9 00 01 e8 ba 80 d1 d0 48
83 05 72 aa ea 00 01 <0f> 0b 48 83 05 70 aa ea 00 01 44 8b 8d 08 fe ff ff 48 8b 
95 18 fe
[   84.046062] RSP: 0018:a1e2c00c7bf0 EFLAGS: 00010202
[   84.046064] RAX:  RBX: 8c58558d9c00 RCX: 
[   84.046066] RDX: 0002 RSI: 9370d98a RDI: 
[   84.046067] RBP: a1e2c00c7e00 R08: 0003 R09: 0001
[   84.046069] R10: 0001 R11: 0001 R12: 8c58555ad008
[   84.046070] R13: 0400 R14: 8c58542f9510 R15: 8c5854cbeea8
[   84.046071] FS:  () GS:8c676dc8() 
knlGS:
[   84.046073] CS:  0010 DS:  ES:  CR0: 80050033
[   84.046074] CR2: 7ffd279bb0c8 CR3: 000fde856003 CR4: 003706e0
[   84.046076] DR0:  DR1:  DR2: 
[   84.046077] DR3:  DR6: fffe0ff0 DR7: 0400
[   84.046078] Call Trace:
[   84.046079]  
[   84.046081]  ? show_regs+0x6a/0x80
[   84.046085]  ? amdgpu_amdkfd_gpuvm_restore_process_bos+0xa9e/0xfe0 [amdgpu]
[   84.156138]  ? __warn+0x8d/0x180
[   84.156142]  ? amdgpu_amdkfd_gpuvm_restore_process_bos+0xa9e/0xfe0 [amdgpu]
[   84.166431]  ? report_bug+0x1e8/0x240
[   84.166435]  ? __wake_up_klogd.part.0+0x64/0xa0
[   84.166440]  ? handle_bug+0x46/0x80
[   84.166444]  ? exc_invalid_op+0x19/0x70
[   84.166447]  ? asm_exc_invalid_op+0x1b/0x20
[   84.166457]  ? amdgpu_amdkfd_gpuvm_restore_process_bos+0xa9e/0xfe0 [amdgpu]
[   84.166917]  ? __lock_acquire+0x5f3/0x28c0
[   84.166921]  ? __this_cpu_preempt_check+0x13/0x20
[   84.166938]  restore_process_helper+0x33/0x110 [amdgpu]
[   84.167292]  restore_process_worker+0x40/0x130 [amdgpu]
[   84.167644]  process_one_work+0x26a/0x550
[   84.167654]  worker_thread+0x58/0x3c0
[   84.167659]  ? __pfx_worker_thread+0x10/0x10
[   84.167661]  kthread+0x105/0x130
[   84.167664]  ? __pfx_kthread+0x10/0x10
[   84.167669]  ret_from_fork+0x29/0x50
[   84.167681]  
[   84.167683] irq event stamp: 1343665
[   84.167684] hardirqs last  enabled at (1343671): [] 
vprintk_emit+0x37b/0x3a0
[   84.167687] hardirqs last disabled at (1343676): [] 
vprintk_emit+0x367/0x3a0
[   84.167689] softirqs last  enabled at (1342680): [] 
__irq_exit_rcu+0xd3/0x140
[   84.167691] softirqs last disabled at (1342671): [] 
__irq_exit_rcu+0xd3/0x140
[   84.167692] ---[ end trace  ]---
[   84.189957] PM: suspe

Thanks
xinhui

-Original Message-
From: Pan, Xinhui
Sent: Friday, November 10, 2023 12:51 PM
To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Koenig, Christian 

Subject: RE: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs

I once replaced the queue with the freezable one, but got hang in flush.
Looks like Felix has fixed it.

Acked-and-tested-by: xinhui pan 


-Original Message-
From: Kuehling, Felix 
Sent: Wednesday, November 8, 2023 6:06 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Pan, Xinhui ; Koenig, 

Re: [PATCH 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-11-23 Thread James Zhu



On 2023-11-22 16:14, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:

From: David Yat Sin 

Add pc sampling support in kfd_ioctl.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
  include/uapi/linux/kfd_ioctl.h | 57 +-
  1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h 
b/include/uapi/linux/kfd_ioctl.h

index f0ed68974c54..5202e29c9560 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args {
  };
  };
  +/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling 
capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with 
a per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:    Unregister from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking 
samples from a previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking 
samples from a previously registered PC sampler instance

+ */
+enum kfd_ioctl_pc_sample_op {
+    KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+    KFD_IOCTL_PCS_OP_CREATE,
+    KFD_IOCTL_PCS_OP_DESTROY,
+    KFD_IOCTL_PCS_OP_START,
+    KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+    KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+    KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+    KFD_IOCTL_PCS_TYPE_TIME_US,
+    KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+    KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+    __u64 value; /* [IN] if PCS_TYPE_INTERVAL_US: sample 
interval in us
+  * if PCS_TYPE_CLOCK_CYCLES: sample 
interval in graphics core clk cycles
+  * if PCS_TYPE_INSTRUCTIONS: sample 
interval in instructions issued by

+  * graphics compute units


I'd call this "interval". That's still generic enough to be a sampling 
interval in a unit that depends on the PCS type. "value" is 
misleading, because it sounds like it may be an actual sample.

[JZ] I am fine this interface name changes,




+  */
+    __u64 value_min; /* [OUT] */
+    __u64 value_max; /* [OUT] */


interval_min/max.

Regards,
  Felix


+    __u64 flags; /* [OUT] indicate potential restrictions 
e.g FLAG_POWER_OF_2 */

+    __u32 method;    /* [IN/OUT] kfd_ioctl_pc_sample_method */
+    __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+struct kfd_ioctl_pc_sample_args {
+    __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+    __u32 num_sample_info;
+    __u32 op;    /* kfd_ioctl_pc_sample_op */
+    __u32 gpu_id;
+    __u32 trace_id;
+};
+
  #define AMDKFD_IOCTL_BASE 'K'
  #define AMDKFD_IO(nr)    _IO(AMDKFD_IOCTL_BASE, nr)
  #define AMDKFD_IOR(nr, type)    _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args {
  #define AMDKFD_IOC_DBG_TRAP    \
  AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)
  +#define AMDKFD_IOC_PC_SAMPLE    \
+    AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
  #define AMDKFD_COMMAND_START    0x01
-#define AMDKFD_COMMAND_END    0x27
+#define AMDKFD_COMMAND_END    0x28
    #endif


Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid

2023-11-23 Thread Felix Kuehling



On 2023-11-23 15:18, James Zhu wrote:


On 2023-11-22 17:15, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Check pcs_enrty valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 
++--

  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 4c9fc48e1a6a..36366c8847de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -179,6 +179,21 @@ static int kfd_pc_sample_destroy(struct 
kfd_process_device *pdd, uint32_t trace_

  int kfd_pc_sample(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *args)
  {
+    struct pc_sampling_entry *pcs_entry;
+
+    if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+    args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+    mutex_lock(>dev->pcs_data.mutex);
+    pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,

+    args->trace_id);
+    mutex_unlock(>dev->pcs_data.mutex);


You need to keep holding the lock while the pcs_entry is still used. 
That includes any of the kfd_pc_sample_ functions below. 
Otherwise someone could free it concurrently. It would also simplify 
the ..._ functions, if they didn't have to worry about the 
locking themselves.
[JZ] pcs_entry is only for this pc sampling process, which has 
kfd_process->mutex protected here.


OK. That's not obvious. I'm also wary about depending too much on the 
big process lock. We will need to make that locking more granular soon, 
because it is causing performance issues with multi-threaded processes.


Regards,
  Felix




Regards,
  Felix



+
+    if (!pcs_entry ||
+    pcs_entry->pdd != pdd)
+    return -EINVAL;
+    }
+
  switch (args->op) {
  case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
  return kfd_pc_sample_query_cap(pdd, args);
@@ -187,13 +202,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  return kfd_pc_sample_create(pdd, args);
    case KFD_IOCTL_PCS_OP_DESTROY:
-    return kfd_pc_sample_destroy(pdd, args->trace_id);
+    if (pcs_entry->enabled)
+    return -EBUSY;
+    else
+    return kfd_pc_sample_destroy(pdd, args->trace_id);
    case KFD_IOCTL_PCS_OP_START:
-    return kfd_pc_sample_start(pdd);
+    if (pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_start(pdd);
    case KFD_IOCTL_PCS_OP_STOP:
-    return kfd_pc_sample_stop(pdd);
+    if (!pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_stop(pdd);
  }
    return -EINVAL;


Re: [PATCH 05/24] drm/amdkfd: enable pc sampling create

2023-11-23 Thread James Zhu



On 2023-11-22 16:51, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

From: David Yat Sin 

Enable pc sampling create.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 10 
  2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 49fecbc7013e..f0d910ee730c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -97,7 +97,59 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd)

  static int kfd_pc_sample_create(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *user_args)
  {
-    return -EINVAL;
+    struct kfd_pc_sample_info *supported_format = NULL;
+    struct kfd_pc_sample_info user_info;
+    int ret;
+    int i;
+
+    if (user_args->num_sample_info != 1)
+    return -EINVAL;
+
+    ret = copy_from_user(_info, (void __user *) 
user_args->sample_info_ptr,

+    sizeof(struct kfd_pc_sample_info));
+    if (ret) {
+    pr_debug("Failed to copy PC sampling info from user\n");
+    return -EFAULT;
+    }
+
+    for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+    if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version
+    && user_info.method == 
supported_formats[i].sample_info->method

+    && user_info.type == supported_formats[i].sample_info->type
+    && user_info.value <= 
supported_formats[i].sample_info->value_max
+    && user_info.value >= 
supported_formats[i].sample_info->value_min) {

+    supported_format =
+    (struct kfd_pc_sample_info 
*)supported_formats[i].sample_info;

+    break;
+    }
+    }
+
+    if (!supported_format) {
+    pr_debug("Sampling format is not supported!");
+    return -EOPNOTSUPP;
+    }
+
+    mutex_lock(>dev->pcs_data.mutex);
+    if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
+ memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+    _info, sizeof(user_info))) {


I think you can compare structures in C. This would be more readable:

if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info != user_info) {
    ...
}
[JZ[ Sure


+    ret = copy_to_user((void __user *) user_args->sample_info_ptr,
+ >dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+    sizeof(struct kfd_pc_sample_info));
+    mutex_unlock(>dev->pcs_data.mutex);
+    return ret ? ret : -EEXIST;


When copy_to_user fails, it returns the number of bytes not copied. 
That's not a useful return value here. This should be


    return ret ? -EFAULT : -EEXIST;

Also -EBUSY may be more appropriate than -EEXIST.

[JZ[ Sure




+    }
+
+    /* TODO: add trace_id return */
+
+    if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+ memcpy(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+    _info, sizeof(user_info));


I think you can assign structures in C. Just do

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info;
[JZ[ Sure
Regards,
  Felix



+
+    pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
+    mutex_unlock(>dev->pcs_data.mutex);
+
+    return 0;
  }
    static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, 
uint32_t trace_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 4a0b66189c67..81c925fb2952 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -256,9 +256,19 @@ struct kfd_vmid_info {
    struct kfd_dev;
  +struct kfd_dev_pc_sampling_data {
+    uint32_t use_count; /* Num of PC sampling sessions */
+    struct kfd_pc_sample_info pc_sample_info;
+};
+
+struct kfd_dev_pcs_hosttrap {
+    struct kfd_dev_pc_sampling_data base;
+};
+
  /* Per device PC Sampling data */
  struct kfd_dev_pc_sampling {
  struct mutex mutex;
+    struct kfd_dev_pcs_hosttrap hosttrap_entry;
  };
    struct kfd_node {


Re: [PATCH 06/24] drm/amdkfd: add trace_id return

2023-11-23 Thread James Zhu



On 2023-11-22 16:56, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  6 ++
  3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
  static void kfd_pc_sampling_init(struct kfd_node *dev)
  {
  mutex_init(>pcs_data.mutex);
+ idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
  }
    static void kfd_pc_sampling_exit(struct kfd_node *dev)
  {
+ idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr);
  mutex_destroy(>pcs_data.mutex);
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index f0d910ee730c..4c9fc48e1a6a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct 
kfd_process_device *pdd,

  {
  struct kfd_pc_sample_info *supported_format = NULL;
  struct kfd_pc_sample_info user_info;
+    struct pc_sampling_entry *pcs_entry;
  int ret;
  int i;
  @@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct 
kfd_process_device *pdd,

  return ret ? ret : -EEXIST;
  }
  -    /* TODO: add trace_id return */
+    pcs_entry = kvzalloc(sizeof(*pcs_entry), GFP_KERNEL);


I don't see a reason to use kvzalloc here. You know the size of the 
structure, so kzalloc should be perfectly fine.

[JZ] Sure, will change to kzalloc




+    if (!pcs_entry) {
+    mutex_unlock(>dev->pcs_data.mutex);
+    return -ENOMEM;
+    }
+
+    i = 
idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,

+    pcs_entry, 1, 0, GFP_KERNEL);
+    if (i < 0) {
+    mutex_unlock(>dev->pcs_data.mutex);
+    kvfree(pcs_entry);


kfree



+    return i;
+    }
    if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
memcpy(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
@@ -149,6 +162,11 @@ static int kfd_pc_sample_create(struct 
kfd_process_device *pdd,

  pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
  mutex_unlock(>dev->pcs_data.mutex);
  +    pcs_entry->pdd = pdd;
+    user_args->trace_id = (uint32_t)i;


I suspect this should be done inside the lock. You don't want someone 
looking up the pcs_entry before it has been initialized.
[JZ]pcs_entry is for this pc sampling process, and it has 
kfd_process->mutex protected,


Regards,
  Felix



+
+    pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", 
pcs_entry, i, pdd->dev->id);

+
  return 0;
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 81c925fb2952..642558026d16 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,7 @@ struct kfd_dev;
    struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+    struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };
  @@ -743,6 +744,11 @@ enum kfd_pdd_bound {
   */
  #define SDMA_ACTIVITY_DIVISOR  100
  +struct pc_sampling_entry {
+    bool enabled;
+    struct kfd_process_device *pdd;
+};
+
  /* Data that is per-process-per device. */
  struct kfd_process_device {
  /* The device that owns this data. */


Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start

2023-11-23 Thread Felix Kuehling



On 2023-11-23 15:01, James Zhu wrote:


On 2023-11-22 17:27, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable pc sampling start.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 
+---

  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
  2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 60b29b245db5..33d003ca0093 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -83,9 +83,29 @@ static int kfd_pc_sample_query_cap(struct 
kfd_process_device *pdd,

  return 0;
  }
  -static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+    struct pc_sampling_entry *pcs_entry)
  {
-    return -EINVAL;
+    bool pc_sampling_start = false;
+
+    pcs_entry->enabled = true;
+    mutex_lock(>dev->pcs_data.mutex);
+    if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+    pc_sampling_start = true;
+ pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+    mutex_unlock(>dev->pcs_data.mutex);
+
+    while (pc_sampling_start) {
+    if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {

+    usleep_range(1000, 2000);


I don't understand why you need this synchronization through 
stop_enable. Why can't you do both the start and stop while holding 
the mutex? It's just setting a flag in the TMA, so it's not a 
time-consuming operation, and I don't see any potential for deadlocks.
[JZ] for stop, not just set TMA. need wait for current pc sampling 
completely stop and reset some initial setting.


I think that's being obfuscated by how you split up this patch series. 
Maybe if you squash the queue remapping patch into this one, it would be 
more obvious what's really happening when you stop sampling and would 
make it easier to review the synchronization and locking strategy.


Regards,
  Felix




Regards,
  Felix



+    } else {
+ kfd_process_set_trap_pc_sampling_flag(>qpd,
+ pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+    break;
+    }
+    }
+
+    return 0;
  }
    static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
@@ -225,7 +245,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  if (pcs_entry->enabled)
  return -EALREADY;
  else
-    return kfd_pc_sample_start(pdd);
+    return kfd_pc_sample_start(pdd, pcs_entry);
    case KFD_IOCTL_PCS_OP_STOP:
  if (!pcs_entry->enabled)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 6670534f47b8..613910e0d440 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,8 @@ struct kfd_dev;
    struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+    uint32_t active_count;  /* Num of active sessions */
+    bool stop_enable;   /* pc sampling stop in process */
  struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };


Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid

2023-11-23 Thread James Zhu



On 2023-11-22 17:15, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Check pcs_enrty valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 ++--
  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 4c9fc48e1a6a..36366c8847de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -179,6 +179,21 @@ static int kfd_pc_sample_destroy(struct 
kfd_process_device *pdd, uint32_t trace_

  int kfd_pc_sample(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *args)
  {
+    struct pc_sampling_entry *pcs_entry;
+
+    if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+    args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+    mutex_lock(>dev->pcs_data.mutex);
+    pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,

+    args->trace_id);
+    mutex_unlock(>dev->pcs_data.mutex);


You need to keep holding the lock while the pcs_entry is still used. 
That includes any of the kfd_pc_sample_ functions below. Otherwise 
someone could free it concurrently. It would also simplify the 
..._ functions, if they didn't have to worry about the locking 
themselves.
[JZ] pcs_entry is only for this pc sampling process, which has 
kfd_process->mutex protected here.


Regards,
  Felix



+
+    if (!pcs_entry ||
+    pcs_entry->pdd != pdd)
+    return -EINVAL;
+    }
+
  switch (args->op) {
  case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
  return kfd_pc_sample_query_cap(pdd, args);
@@ -187,13 +202,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  return kfd_pc_sample_create(pdd, args);
    case KFD_IOCTL_PCS_OP_DESTROY:
-    return kfd_pc_sample_destroy(pdd, args->trace_id);
+    if (pcs_entry->enabled)
+    return -EBUSY;
+    else
+    return kfd_pc_sample_destroy(pdd, args->trace_id);
    case KFD_IOCTL_PCS_OP_START:
-    return kfd_pc_sample_start(pdd);
+    if (pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_start(pdd);
    case KFD_IOCTL_PCS_OP_STOP:
-    return kfd_pc_sample_stop(pdd);
+    if (!pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_stop(pdd);
  }
    return -EINVAL;


Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start

2023-11-23 Thread James Zhu



On 2023-11-22 17:27, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable pc sampling start.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +---
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
  2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 60b29b245db5..33d003ca0093 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -83,9 +83,29 @@ static int kfd_pc_sample_query_cap(struct 
kfd_process_device *pdd,

  return 0;
  }
  -static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+    struct pc_sampling_entry *pcs_entry)
  {
-    return -EINVAL;
+    bool pc_sampling_start = false;
+
+    pcs_entry->enabled = true;
+    mutex_lock(>dev->pcs_data.mutex);
+    if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+    pc_sampling_start = true;
+ pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+    mutex_unlock(>dev->pcs_data.mutex);
+
+    while (pc_sampling_start) {
+    if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {

+    usleep_range(1000, 2000);


I don't understand why you need this synchronization through 
stop_enable. Why can't you do both the start and stop while holding 
the mutex? It's just setting a flag in the TMA, so it's not a 
time-consuming operation, and I don't see any potential for deadlocks.
[JZ] for stop, not just set TMA. need wait for current pc sampling 
completely stop and reset some initial setting.


Regards,
  Felix



+    } else {
+ kfd_process_set_trap_pc_sampling_flag(>qpd,
+ pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+    break;
+    }
+    }
+
+    return 0;
  }
    static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
@@ -225,7 +245,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  if (pcs_entry->enabled)
  return -EALREADY;
  else
-    return kfd_pc_sample_start(pdd);
+    return kfd_pc_sample_start(pdd, pcs_entry);
    case KFD_IOCTL_PCS_OP_STOP:
  if (!pcs_entry->enabled)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 6670534f47b8..613910e0d440 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,8 @@ struct kfd_dev;
    struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+    uint32_t active_count;  /* Num of active sessions */
+    bool stop_enable;   /* pc sampling stop in process */
  struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };


[PATCH] drm/amdgpu: Enable event log on MES 11

2023-11-23 Thread shaoyunl
Enable event log through the HW specific FW API

Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 2 ++
 drivers/gpu/drm/amd/include/mes_v11_api_def.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 4dfec56e1b7f..26d71a22395d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -408,6 +408,8 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
mes_set_hw_res_pkt.enable_level_process_quantum_check = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;
+   mes_set_hw_res_pkt.enable_mes_event_int_logging = 1;
+   mes_set_hw_res_pkt.event_intr_history_gpu_mc_ptr = 
mes->event_log_gpu_addr;
 
return mes_v11_0_submit_pkt_and_poll_completion(mes,
_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt),
diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
index b1db2b190187..1fbfd1aa987e 100644
--- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
@@ -232,6 +232,7 @@ union MESAPI_SET_HW_RESOURCES {
};
uint32_toversubscription_timer;
uint64_tdoorbell_info;
+   uint64_tevent_intr_history_gpu_mc_ptr;
};
 
uint32_tmax_dwords_in_api[API_FRAME_SIZE_IN_DWORDS];
-- 
2.34.1



Re: [PATCH 20/24] drm/amdkfd: enable pc sampling work to trigger trap

2023-11-23 Thread James Zhu



On 2023-11-23 14:08, Felix Kuehling wrote:

On 2023-11-23 13:27, James Zhu wrote:


On 2023-11-22 17:31, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable a delay work to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  3 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 39 


  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  1 +
  4 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index bcaeedac8fe0..fb21902e433a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -35,6 +35,7 @@
  #include "kfd_migrate.h"
  #include "amdgpu.h"
  #include "amdgpu_xcp.h"
+#include "kfd_pc_sampling.h"
    #define MQD_SIZE_ALIGNED 768
  @@ -537,6 +538,8 @@ static void kfd_pc_sampling_init(struct 
kfd_node *dev)

  {
  mutex_init(>pcs_data.mutex);
idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
+ INIT_WORK(>pcs_data.hosttrap_entry.base.pc_sampling_work,
+    kfd_pc_sample_handler);
  }
    static void kfd_pc_sampling_exit(struct kfd_node *dev)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 2c4ac5b4cc4b..e8f0559b618e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -38,6 +38,43 @@ struct supported_pc_sample_info 
supported_formats[] = {

  { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
  };
  +void kfd_pc_sample_handler(struct work_struct *work)
+{
+    struct amdgpu_device *adev;
+    struct kfd_node *node;
+    uint32_t timeout = 0;
+
+    node = container_of(work, struct kfd_node,
+ pcs_data.hosttrap_entry.base.pc_sampling_work);
+
+    mutex_lock(>pcs_data.mutex);
+    if (node->pcs_data.hosttrap_entry.base.active_count &&
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.value &&
+    node->kfd2kgd->trigger_pc_sample_trap) {
+    switch 
(node->pcs_data.hosttrap_entry.base.pc_sample_info.type) {

+    case KFD_IOCTL_PCS_TYPE_TIME_US:
+    timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.value;

+    break;
+    default:
+    pr_debug("PC Sampling type %d not supported.",
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+    }
+    }
+    mutex_unlock(>pcs_data.mutex);
+    if (!timeout)
+    return;
+
+    adev = node->adev;
+    while 
(!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) {


This worker basically runs indefinitely (controlled by user mode).

+ node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,

+ >pcs_data.hosttrap_entry.base.target_simd,
+ >pcs_data.hosttrap_entry.base.target_wave_slot,
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+    pr_debug_ratelimited("triggered a host trap.");
+    usleep_range(timeout, timeout + 10);


This will cause drift of the interval. Instead what you should do, 
is calculate the wait time at the end of every iteration based on 
the current time and the interval.
[JZ] I am wondering what degree of accuracy is requested  on 
interval, there is HW time stamp with each pc sampling data packet,




+    }
+}
+
  static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user 
*user_args)

  {
@@ -101,6 +138,7 @@ static int kfd_pc_sample_start(struct 
kfd_process_device *pdd,

  } else {
kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+ 
schedule_work(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work);


Scheduling a worker that runs indefinitely on the system workqueue 
is probably a bad idea. It could block other work items 
indefinitely. I think you are misusing the work queue API here. What 
you really want is probably, to crease a kernel thread.
[JZ] Yes, you are right. How about use  alloc_workqueue to create 
queue instead of system queue, is alloc_workqueue more efficient than 
kernel thread creation?


A work queue can create many kernel threads to handle the execution of 
work items. You really only need a single kernel thread per GPU for 
time-based PC sampling. IMO the work queue just adds a bunch of 
overhead. Using a work queue for something that runs indefinitely 
feels like an abuse of the API. I don't have much experience with 
creating kernel threads directly. See include/linux/kthread.h. If you 
want to look for an example, it seems drivers/gpu/drm/scheduler uses 
the kthread API.

[JZ] then let me switch to kthread


Regards,
  Felix




Regards,
  Felix



  break;
  }
  }
@@ -123,6 +161,7 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd,

  mutex_unlock(>dev->pcs_data.mutex);
    if (pc_sampling_stop) 

Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread James Zhu



On 2023-11-23 14:02, Felix Kuehling wrote:

On 2023-11-23 11:25, James Zhu wrote:


On 2023-11-22 17:35, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Add queue remapping to force the waves in any running
processes to complete a CWSR trap.


Please add an explanation why this is needed.


[JZ] Even though the profiling-enabled bits is turned off, the CWSR 
trap handlers for some kernels with this process may still in running 
stage, this will


force the waves in any running processes to complete a CWSR trap, and 
make sure pc sampling is completely stopped with this process.   I 
will add it later.


It may be confusing to talk specifically about "CWSR trap handler". 
There is only one trap handler that is triggered by different events: 
CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap 
triggers, it serializes with any currently running trap handler in 
that wavefront. So it seems that you're using CWSR as a way to ensure 
that any host trap has completed: CWSR will wait for previous traps to 
finish before trapping again for CWSR, the HWS firmware waits for CWSR 
completion and the driver waits for HWS to finish CWSR with a fence on 
a HIQ QUERY_STATUS packet. Is that correct?

[JZ] I think your explanation is more detail. Need Joseph to confirm.


Regards,
  Felix









Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 
+++

  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  3 +++
  3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct 
device_queue_manager *dqm)

  return debug_map_and_unlock(dqm);
  }
  +void remap_queue(struct device_queue_manager *dqm,
+    enum kfd_unmap_queues_filter filter,
+    uint32_t filter_param,
+    uint32_t grace_period)


Not sure if you need the filter and grace period parameters in this 
function. What's the point of exposing that to callers who just want 
to trigger a CWSR? You could also change the function name to 
reflect the purpose of the function, rather than the implementation.
[JZ] Just want to create a general function in case that used by 
others. I am fine to remove passing filter_param/grace_period


Regards,
  Felix



+{
+    dqm_lock(dqm);
+    if (!dqm->dev->kfd->shared_resources.enable_mes)
+    execute_queues_cpsch(dqm, filter, filter_param, 
grace_period);

+    dqm_unlock(dqm);
+}
+
  #if defined(CONFIG_DEBUG_FS)
    static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h

index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct 
device_queue_manager *dqm);

  int debug_map_and_unlock(struct device_queue_manager *dqm);
  int debug_refresh_runlist(struct device_queue_manager *dqm);
  +void remap_queue(struct device_queue_manager *dqm,
+    enum kfd_unmap_queues_filter filter,
+    uint32_t filter_param,
+    uint32_t grace_period);
+
  static inline unsigned int get_sh_mem_bases_32(struct 
kfd_process_device *pdd)

  {
  return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index e8f0559b618e..66670cdb813a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
  #include "kfd_priv.h"
  #include "amdgpu_amdkfd.h"
  #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
    struct supported_pc_sample_info {
  uint32_t ip_version;
@@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd,
cancel_work_sync(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work); 


kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+    remap_queue(pdd->dev->dqm,
+    KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);

    mutex_lock(>dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;


[PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread shaoyunl
This is the generic SW part, prepare the event log buffer and dump it through 
debugfs

Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 61 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  5 ++
 4 files changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a53f436fa9f1..8b2cbeae99ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2140,6 +2140,8 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
amdgpu_debugfs_firmware_init(adev);
amdgpu_ta_if_debugfs_init(adev);
 
+   amdgpu_debugfs_mes_event_log_init(adev);
+
 #if defined(CONFIG_DRM_AMD_DC)
if (adev->dc_enabled)
dtn_debugfs_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
index 371a6f0deb29..0425432d8659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
@@ -32,3 +32,5 @@ void amdgpu_debugfs_fini(struct amdgpu_device *adev);
 void amdgpu_debugfs_fence_init(struct amdgpu_device *adev);
 void amdgpu_debugfs_firmware_init(struct amdgpu_device *adev);
 void amdgpu_debugfs_gem_init(struct amdgpu_device *adev);
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev);
+
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 45280fb0e00c..b4ba556dc733 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -97,6 +97,26 @@ static int amdgpu_mes_doorbell_init(struct amdgpu_device 
*adev)
return 0;
 }
 
+static int amdgpu_mes_event_log_init(struct amdgpu_device *adev)
+{
+   int r;
+
+   r = amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_GTT,
+   >mes.event_log_gpu_obj,
+   >mes.event_log_gpu_addr,
+   >mes.event_log_cpu_addr);
+   if (r) {
+   dev_warn(adev->dev, "failed to create MES event log buffer 
(%d)", r);
+   return r;
+   }
+
+   memset(adev->mes.event_log_cpu_addr, 0, PAGE_SIZE);
+
+   return  0;
+
+}
+
 static void amdgpu_mes_doorbell_free(struct amdgpu_device *adev)
 {
bitmap_free(adev->mes.doorbell_bitmap);
@@ -181,8 +201,14 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
if (r)
goto error;
 
+   r = amdgpu_mes_event_log_init(adev);
+   if (r)
+   goto error_doorbell;
+
return 0;
 
+error_doorbell:
+   amdgpu_mes_doorbell_free(adev);
 error:
amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
@@ -198,6 +224,10 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
 
 void amdgpu_mes_fini(struct amdgpu_device *adev)
 {
+   amdgpu_bo_free_kernel(>mes.event_log_gpu_obj,
+ >mes.event_log_gpu_addr,
+ >mes.event_log_cpu_addr);
+
amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
amdgpu_device_wb_free(adev, adev->mes.read_val_offs);
@@ -1483,3 +1513,34 @@ int amdgpu_mes_init_microcode(struct amdgpu_device 
*adev, int pipe)
amdgpu_ucode_release(>mes.fw[pipe]);
return r;
 }
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, void *unused)
+{
+   struct amdgpu_device *adev = m->private;
+   uint32_t *mem = (uint32_t *)(adev->mes.event_log_cpu_addr);
+
+   seq_hex_dump(m, "", DUMP_PREFIX_OFFSET, 32, 4,
+mem, PAGE_SIZE, false);
+
+   return 0;
+}
+
+
+DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_mes_event_log);
+
+#endif
+
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev)
+{
+
+#if defined(CONFIG_DEBUG_FS)
+   struct drm_minor *minor = adev_to_drm(adev)->primary;
+   struct dentry *root = minor->debugfs_root;
+
+   debugfs_create_file("amdgpu_mes_event_log", 0444, root,
+   adev, _debugfs_mes_event_log_fops);
+
+#endif
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index a27b424ffe00..894b9b133000 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -133,6 +133,11 @@ struct amdgpu_mes {
uint32_tnum_mes_dbs;
unsigned long   *doorbell_bitmap;
 
+   /* MES event log buffer */
+   struct amdgpu_bo*event_log_gpu_obj;
+   uint64_tevent_log_gpu_addr;
+   void  

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-23 Thread Felix Kuehling

[+Alex]

On 2023-11-17 16:44, Felix Kuehling wrote:


This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.

These helper functions are needed for KFD to export and import DMABufs
the right way without duplicating the tracking of DMABufs associated with
GEM objects while ensuring that move notifier callbacks are working as
intended.

CC: Christian König 
CC: Thomas Zimmermann 
Signed-off-by: Felix Kuehling 


Re: our discussion about v2 of this patch: If this version is 
acceptable, can I get an R-b or A-b?


I would like to get this patch into drm-next as a prerequisite for 
patches 2 and 3. I cannot submit it to the current amd-staging-drm-next 
because the patch I'm reverting doesn't exist there yet.


Patch 2 and 3 could go into drm-next as well, or go through Alex's 
amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do 
you prefer to coordinate this?


Regards,
  Felix



---
  drivers/gpu/drm/drm_prime.c | 33 ++---
  include/drm/drm_prime.h |  7 +++
  2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 63b709a67471..834a5e28abbe 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf)
  }
  EXPORT_SYMBOL(drm_gem_dmabuf_release);
  
-/*

+/**
   * drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers
   * @dev: drm_device to import into
   * @file_priv: drm file-private structure
@@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release);
   *
   * Returns 0 on success or a negative error code on failure.
   */
-static int drm_gem_prime_fd_to_handle(struct drm_device *dev,
- struct drm_file *file_priv, int prime_fd,
- uint32_t *handle)
+int drm_gem_prime_fd_to_handle(struct drm_device *dev,
+  struct drm_file *file_priv, int prime_fd,
+  uint32_t *handle)
  {
struct dma_buf *dma_buf;
struct drm_gem_object *obj;
@@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct drm_device 
*dev,
dma_buf_put(dma_buf);
return ret;
  }
+EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);
  
  int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,

 struct drm_file *file_priv)
@@ -408,7 +409,7 @@ static struct dma_buf *export_and_register_object(struct 
drm_device *dev,
return dmabuf;
  }
  
-/*

+/**
   * drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers
   * @dev: dev to export the buffer from
   * @file_priv: drm file-private structure
@@ -421,10 +422,10 @@ static struct dma_buf *export_and_register_object(struct 
drm_device *dev,
   * The actual exporting from GEM object to a dma-buf is done through the
   * _gem_object_funcs.export callback.
   */
-static int drm_gem_prime_handle_to_fd(struct drm_device *dev,
- struct drm_file *file_priv, uint32_t 
handle,
- uint32_t flags,
- int *prime_fd)
+int drm_gem_prime_handle_to_fd(struct drm_device *dev,
+  struct drm_file *file_priv, uint32_t handle,
+  uint32_t flags,
+  int *prime_fd)
  {
struct drm_gem_object *obj;
int ret = 0;
@@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct drm_device 
*dev,
  
  	return ret;

  }
+EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);
  
  int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data,

 struct drm_file *file_priv)
@@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size);
   * @obj: GEM object to export
   * @flags: flags like DRM_CLOEXEC and DRM_RDWR
   *
- * This is the implementation of the _gem_object_funcs.export functions
- * for GEM drivers using the PRIME helpers. It is used as the default for
- * drivers that do not set their own.
+ * This is the implementation of the _gem_object_funcs.export functions 
for GEM drivers
+ * using the PRIME helpers. It is used as the default in
+ * drm_gem_prime_handle_to_fd().
   */
  struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
 int flags)
@@ -962,9 +964,10 @@ EXPORT_SYMBOL(drm_gem_prime_import_dev);
   * @dev: drm_device to import into
   * @dma_buf: dma-buf object to import
   *
- * This is the implementation of the gem_prime_import functions for GEM
- * drivers using the PRIME helpers. It is the default for drivers that do
- * not set their own _driver.gem_prime_import.
+ * This is the implementation of the gem_prime_import functions for GEM drivers
+ * using the PRIME helpers. Drivers can use this as their
+ * _driver.gem_prime_import implementation. It is used as the default
+ * implementation 

Re: [PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread Felix Kuehling



On 2023-11-23 14:12, shaoyunl wrote:

This is the generic SW part, prepare the event log buffer and dump it through 
debugfs

Signed-off-by: shaoyunl 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 61 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  5 ++
  4 files changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a53f436fa9f1..8b2cbeae99ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2140,6 +2140,8 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
amdgpu_debugfs_firmware_init(adev);
amdgpu_ta_if_debugfs_init(adev);
  
+	amdgpu_debugfs_mes_event_log_init(adev);

+
  #if defined(CONFIG_DRM_AMD_DC)
if (adev->dc_enabled)
dtn_debugfs_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
index 371a6f0deb29..0425432d8659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
@@ -32,3 +32,5 @@ void amdgpu_debugfs_fini(struct amdgpu_device *adev);
  void amdgpu_debugfs_fence_init(struct amdgpu_device *adev);
  void amdgpu_debugfs_firmware_init(struct amdgpu_device *adev);
  void amdgpu_debugfs_gem_init(struct amdgpu_device *adev);
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev);
+
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 45280fb0e00c..b7af24d7db0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -97,6 +97,26 @@ static int amdgpu_mes_doorbell_init(struct amdgpu_device 
*adev)
return 0;
  }
  
+static int amdgpu_mes_event_log_init(struct amdgpu_device *adev)

+{
+   int r;
+
+   r = amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_GTT,
+   >mes.event_log_gpu_obj,
+   >mes.event_log_gpu_addr,
+   >mes.event_log_cpu_addr);
+   if (r) {
+   dev_warn(adev->dev, "failed to create MES event log buffer 
(%d)", r);
+   return r;
+   }
+
+   memset(adev->mes.event_log_cpu_addr, 0, PAGE_SIZE);
+
+   return  0;
+
+}
+
  static void amdgpu_mes_doorbell_free(struct amdgpu_device *adev)
  {
bitmap_free(adev->mes.doorbell_bitmap);
@@ -181,6 +201,12 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
if (r)
goto error;
  
+	r = amdgpu_mes_event_log_init(adev);

+   if (r) {
+   amdgpu_mes_doorbell_free(adev);
+   goto error;


The usual preferred way of goto-error handling would be to add another 
error label and do all the cleanup in reverse. Then just jump to the 
correct error label depending on where the error happened. So here you 
would goto error_doorbell. See below.




+   }
+
return 0;
  


So you'd create another error label here to handle the doorbell cleanup:

error_doorbell:
amdgpu_mes_doorbell_free(adev);

With that fixed, the patch is

Reviewed-by: Felix Kuehling 



  error:
@@ -198,6 +224,10 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
  
  void amdgpu_mes_fini(struct amdgpu_device *adev)

  {
+   amdgpu_bo_free_kernel(>mes.event_log_gpu_obj,
+ >mes.event_log_gpu_addr,
+ >mes.event_log_cpu_addr);
+
amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
amdgpu_device_wb_free(adev, adev->mes.read_val_offs);
@@ -1483,3 +1513,34 @@ int amdgpu_mes_init_microcode(struct amdgpu_device 
*adev, int pipe)
amdgpu_ucode_release(>mes.fw[pipe]);
return r;
  }
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, void *unused)
+{
+   struct amdgpu_device *adev = m->private;
+   uint32_t *mem = (uint32_t *)(adev->mes.event_log_cpu_addr);
+
+   seq_hex_dump(m, "", DUMP_PREFIX_OFFSET, 32, 4,
+mem, PAGE_SIZE, false);
+
+   return 0;
+}
+
+
+DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_mes_event_log);
+
+#endif
+
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev)
+{
+
+#if defined(CONFIG_DEBUG_FS)
+   struct drm_minor *minor = adev_to_drm(adev)->primary;
+   struct dentry *root = minor->debugfs_root;
+
+   debugfs_create_file("amdgpu_mes_event_log", 0444, root,
+   adev, _debugfs_mes_event_log_fops);
+
+#endif
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index a27b424ffe00..894b9b133000 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

[PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread shaoyunl
This is the generic SW part, prepare the event log buffer and dump it through 
debugfs

Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 61 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  5 ++
 4 files changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a53f436fa9f1..8b2cbeae99ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2140,6 +2140,8 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
amdgpu_debugfs_firmware_init(adev);
amdgpu_ta_if_debugfs_init(adev);
 
+   amdgpu_debugfs_mes_event_log_init(adev);
+
 #if defined(CONFIG_DRM_AMD_DC)
if (adev->dc_enabled)
dtn_debugfs_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
index 371a6f0deb29..0425432d8659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h
@@ -32,3 +32,5 @@ void amdgpu_debugfs_fini(struct amdgpu_device *adev);
 void amdgpu_debugfs_fence_init(struct amdgpu_device *adev);
 void amdgpu_debugfs_firmware_init(struct amdgpu_device *adev);
 void amdgpu_debugfs_gem_init(struct amdgpu_device *adev);
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev);
+
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 45280fb0e00c..b7af24d7db0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -97,6 +97,26 @@ static int amdgpu_mes_doorbell_init(struct amdgpu_device 
*adev)
return 0;
 }
 
+static int amdgpu_mes_event_log_init(struct amdgpu_device *adev)
+{
+   int r;
+
+   r = amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_GTT,
+   >mes.event_log_gpu_obj,
+   >mes.event_log_gpu_addr,
+   >mes.event_log_cpu_addr);
+   if (r) {
+   dev_warn(adev->dev, "failed to create MES event log buffer 
(%d)", r);
+   return r;
+   }
+
+   memset(adev->mes.event_log_cpu_addr, 0, PAGE_SIZE);
+
+   return  0;
+
+}
+
 static void amdgpu_mes_doorbell_free(struct amdgpu_device *adev)
 {
bitmap_free(adev->mes.doorbell_bitmap);
@@ -181,6 +201,12 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
if (r)
goto error;
 
+   r = amdgpu_mes_event_log_init(adev);
+   if (r) {
+   amdgpu_mes_doorbell_free(adev);
+   goto error;
+   }
+
return 0;
 
 error:
@@ -198,6 +224,10 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
 
 void amdgpu_mes_fini(struct amdgpu_device *adev)
 {
+   amdgpu_bo_free_kernel(>mes.event_log_gpu_obj,
+ >mes.event_log_gpu_addr,
+ >mes.event_log_cpu_addr);
+
amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
amdgpu_device_wb_free(adev, adev->mes.read_val_offs);
@@ -1483,3 +1513,34 @@ int amdgpu_mes_init_microcode(struct amdgpu_device 
*adev, int pipe)
amdgpu_ucode_release(>mes.fw[pipe]);
return r;
 }
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, void *unused)
+{
+   struct amdgpu_device *adev = m->private;
+   uint32_t *mem = (uint32_t *)(adev->mes.event_log_cpu_addr);
+
+   seq_hex_dump(m, "", DUMP_PREFIX_OFFSET, 32, 4,
+mem, PAGE_SIZE, false);
+
+   return 0;
+}
+
+
+DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_mes_event_log);
+
+#endif
+
+void amdgpu_debugfs_mes_event_log_init(struct amdgpu_device *adev)
+{
+
+#if defined(CONFIG_DEBUG_FS)
+   struct drm_minor *minor = adev_to_drm(adev)->primary;
+   struct dentry *root = minor->debugfs_root;
+
+   debugfs_create_file("amdgpu_mes_event_log", 0444, root,
+   adev, _debugfs_mes_event_log_fops);
+
+#endif
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index a27b424ffe00..894b9b133000 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -133,6 +133,11 @@ struct amdgpu_mes {
uint32_tnum_mes_dbs;
unsigned long   *doorbell_bitmap;
 
+   /* MES event log buffer */
+   struct amdgpu_bo*event_log_gpu_obj;
+   uint64_tevent_log_gpu_addr;
+   void*event_log_cpu_addr;
+
/* ip specific functions */
const struct amdgpu_mes_funcs   *funcs;
 };

Re: [PATCH 20/24] drm/amdkfd: enable pc sampling work to trigger trap

2023-11-23 Thread Felix Kuehling

On 2023-11-23 13:27, James Zhu wrote:


On 2023-11-22 17:31, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable a delay work to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  3 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 39 


  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  1 +
  4 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index bcaeedac8fe0..fb21902e433a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -35,6 +35,7 @@
  #include "kfd_migrate.h"
  #include "amdgpu.h"
  #include "amdgpu_xcp.h"
+#include "kfd_pc_sampling.h"
    #define MQD_SIZE_ALIGNED 768
  @@ -537,6 +538,8 @@ static void kfd_pc_sampling_init(struct 
kfd_node *dev)

  {
  mutex_init(>pcs_data.mutex);
idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
+ INIT_WORK(>pcs_data.hosttrap_entry.base.pc_sampling_work,
+    kfd_pc_sample_handler);
  }
    static void kfd_pc_sampling_exit(struct kfd_node *dev)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 2c4ac5b4cc4b..e8f0559b618e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -38,6 +38,43 @@ struct supported_pc_sample_info 
supported_formats[] = {

  { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
  };
  +void kfd_pc_sample_handler(struct work_struct *work)
+{
+    struct amdgpu_device *adev;
+    struct kfd_node *node;
+    uint32_t timeout = 0;
+
+    node = container_of(work, struct kfd_node,
+ pcs_data.hosttrap_entry.base.pc_sampling_work);
+
+    mutex_lock(>pcs_data.mutex);
+    if (node->pcs_data.hosttrap_entry.base.active_count &&
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.value &&
+    node->kfd2kgd->trigger_pc_sample_trap) {
+    switch 
(node->pcs_data.hosttrap_entry.base.pc_sample_info.type) {

+    case KFD_IOCTL_PCS_TYPE_TIME_US:
+    timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.value;

+    break;
+    default:
+    pr_debug("PC Sampling type %d not supported.",
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+    }
+    }
+    mutex_unlock(>pcs_data.mutex);
+    if (!timeout)
+    return;
+
+    adev = node->adev;
+    while 
(!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) {


This worker basically runs indefinitely (controlled by user mode).

+ node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,

+ >pcs_data.hosttrap_entry.base.target_simd,
+ >pcs_data.hosttrap_entry.base.target_wave_slot,
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+    pr_debug_ratelimited("triggered a host trap.");
+    usleep_range(timeout, timeout + 10);


This will cause drift of the interval. Instead what you should do, is 
calculate the wait time at the end of every iteration based on the 
current time and the interval.
[JZ] I am wondering what degree of accuracy is requested  on interval, 
there is HW time stamp with each pc sampling data packet,




+    }
+}
+
  static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user 
*user_args)

  {
@@ -101,6 +138,7 @@ static int kfd_pc_sample_start(struct 
kfd_process_device *pdd,

  } else {
kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+ 
schedule_work(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work);


Scheduling a worker that runs indefinitely on the system workqueue is 
probably a bad idea. It could block other work items indefinitely. I 
think you are misusing the work queue API here. What you really want 
is probably, to crease a kernel thread.
[JZ] Yes, you are right. How about use  alloc_workqueue to create 
queue instead of system queue, is alloc_workqueue more efficient than 
kernel thread creation?


A work queue can create many kernel threads to handle the execution of 
work items. You really only need a single kernel thread per GPU for 
time-based PC sampling. IMO the work queue just adds a bunch of 
overhead. Using a work queue for something that runs indefinitely feels 
like an abuse of the API. I don't have much experience with creating 
kernel threads directly. See include/linux/kthread.h. If you want to 
look for an example, it seems drivers/gpu/drm/scheduler uses the kthread 
API.


Regards,
  Felix




Regards,
  Felix



  break;
  }
  }
@@ -123,6 +161,7 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd,

  mutex_unlock(>dev->pcs_data.mutex);
    if (pc_sampling_stop) {
+ 
cancel_work_sync(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work);


Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread Felix Kuehling

On 2023-11-23 11:25, James Zhu wrote:


On 2023-11-22 17:35, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Add queue remapping to force the waves in any running
processes to complete a CWSR trap.


Please add an explanation why this is needed.


[JZ] Even though the profiling-enabled bits is turned off, the CWSR 
trap handlers for some kernels with this process may still in running 
stage, this will


force the waves in any running processes to complete a CWSR trap, and 
make sure pc sampling is completely stopped with this process.   I 
will add it later.


It may be confusing to talk specifically about "CWSR trap handler". 
There is only one trap handler that is triggered by different events: 
CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap 
triggers, it serializes with any currently running trap handler in that 
wavefront. So it seems that you're using CWSR as a way to ensure that 
any host trap has completed: CWSR will wait for previous traps to finish 
before trapping again for CWSR, the HWS firmware waits for CWSR 
completion and the driver waits for HWS to finish CWSR with a fence on a 
HIQ QUERY_STATUS packet. Is that correct?


Regards,
  Felix









Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 
+++

  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  3 +++
  3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct 
device_queue_manager *dqm)

  return debug_map_and_unlock(dqm);
  }
  +void remap_queue(struct device_queue_manager *dqm,
+    enum kfd_unmap_queues_filter filter,
+    uint32_t filter_param,
+    uint32_t grace_period)


Not sure if you need the filter and grace period parameters in this 
function. What's the point of exposing that to callers who just want 
to trigger a CWSR? You could also change the function name to reflect 
the purpose of the function, rather than the implementation.
[JZ] Just want to create a general function in case that used by 
others. I am fine to remove passing filter_param/grace_period


Regards,
  Felix



+{
+    dqm_lock(dqm);
+    if (!dqm->dev->kfd->shared_resources.enable_mes)
+    execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+    dqm_unlock(dqm);
+}
+
  #if defined(CONFIG_DEBUG_FS)
    static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h

index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct 
device_queue_manager *dqm);

  int debug_map_and_unlock(struct device_queue_manager *dqm);
  int debug_refresh_runlist(struct device_queue_manager *dqm);
  +void remap_queue(struct device_queue_manager *dqm,
+    enum kfd_unmap_queues_filter filter,
+    uint32_t filter_param,
+    uint32_t grace_period);
+
  static inline unsigned int get_sh_mem_bases_32(struct 
kfd_process_device *pdd)

  {
  return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index e8f0559b618e..66670cdb813a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
  #include "kfd_priv.h"
  #include "amdgpu_amdkfd.h"
  #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
    struct supported_pc_sample_info {
  uint32_t ip_version;
@@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd,
cancel_work_sync(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work); 


kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+    remap_queue(pdd->dev->dqm,
+    KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);

    mutex_lock(>dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;


Remote UMR branch pushed out

2023-11-23 Thread Tom St Denis
Hello all,

I've pushed out the Remote UMR branch for people to take a look at before I
merge it into main in a couple of weeks.

https://gitlab.freedesktop.org/tomstdenis/umr/-/commit/712acea483cbbacb35cb1a431dea501f041065ff

This feature allows running the privileged side of umr elsewhere
(potentially on another host even) while preserving the common interface
people are used to.

Feel free to try it out.

Tom


RE: [PATCH] drm/amdkfd: Use common function for IP version check

2023-11-23 Thread Kasiviswanathan, Harish
[AMD Official Use Only - General]

Reviewed-by: Harish Kasiviswanathan 

-Original Message-
From: amd-gfx  On Behalf Of Mukul Joshi
Sent: Wednesday, November 22, 2023 3:29 PM
To: amd-gfx@lists.freedesktop.org
Cc: Joshi, Mukul ; Kuehling, Felix 
Subject: [PATCH] drm/amdkfd: Use common function for IP version check

KFD_GC_VERSION was recently updated to use a new function
for IP version checks. As a result, use KFD_GC_VERSION as
the common function for all IP version checks in KFD.

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index a40f8cfc6aa5..45366b4ca976 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1127,7 +1127,7 @@ static inline struct kfd_node *kfd_node_by_irq_ids(struct 
amdgpu_device *adev,
struct kfd_dev *dev = adev->kfd.dev;
uint32_t i;

-   if (adev->ip_versions[GC_HWIP][0] != IP_VERSION(9, 4, 3))
+   if (KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 3))
return dev->nodes[0];

for (i = 0; i < dev->num_nodes; i++)
--
2.35.1



Re: [PATCH 20/24] drm/amdkfd: enable pc sampling work to trigger trap

2023-11-23 Thread James Zhu



On 2023-11-22 17:31, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable a delay work to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  3 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 39 
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  1 +
  4 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index bcaeedac8fe0..fb21902e433a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -35,6 +35,7 @@
  #include "kfd_migrate.h"
  #include "amdgpu.h"
  #include "amdgpu_xcp.h"
+#include "kfd_pc_sampling.h"
    #define MQD_SIZE_ALIGNED 768
  @@ -537,6 +538,8 @@ static void kfd_pc_sampling_init(struct 
kfd_node *dev)

  {
  mutex_init(>pcs_data.mutex);
idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
+ INIT_WORK(>pcs_data.hosttrap_entry.base.pc_sampling_work,
+    kfd_pc_sample_handler);
  }
    static void kfd_pc_sampling_exit(struct kfd_node *dev)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 2c4ac5b4cc4b..e8f0559b618e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -38,6 +38,43 @@ struct supported_pc_sample_info 
supported_formats[] = {

  { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
  };
  +void kfd_pc_sample_handler(struct work_struct *work)
+{
+    struct amdgpu_device *adev;
+    struct kfd_node *node;
+    uint32_t timeout = 0;
+
+    node = container_of(work, struct kfd_node,
+ pcs_data.hosttrap_entry.base.pc_sampling_work);
+
+    mutex_lock(>pcs_data.mutex);
+    if (node->pcs_data.hosttrap_entry.base.active_count &&
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.value &&
+    node->kfd2kgd->trigger_pc_sample_trap) {
+    switch 
(node->pcs_data.hosttrap_entry.base.pc_sample_info.type) {

+    case KFD_IOCTL_PCS_TYPE_TIME_US:
+    timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.value;

+    break;
+    default:
+    pr_debug("PC Sampling type %d not supported.",
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+    }
+    }
+    mutex_unlock(>pcs_data.mutex);
+    if (!timeout)
+    return;
+
+    adev = node->adev;
+    while 
(!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) {


This worker basically runs indefinitely (controlled by user mode).

+ node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,

+ >pcs_data.hosttrap_entry.base.target_simd,
+ >pcs_data.hosttrap_entry.base.target_wave_slot,
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+    pr_debug_ratelimited("triggered a host trap.");
+    usleep_range(timeout, timeout + 10);


This will cause drift of the interval. Instead what you should do, is 
calculate the wait time at the end of every iteration based on the 
current time and the interval.
[JZ] I am wondering what degree of accuracy is requested  on interval, 
there is HW time stamp with each pc sampling data packet,




+    }
+}
+
  static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *user_args)
  {
@@ -101,6 +138,7 @@ static int kfd_pc_sample_start(struct 
kfd_process_device *pdd,

  } else {
kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+ 
schedule_work(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work);


Scheduling a worker that runs indefinitely on the system workqueue is 
probably a bad idea. It could block other work items indefinitely. I 
think you are misusing the work queue API here. What you really want 
is probably, to crease a kernel thread.
[JZ] Yes, you are right. How about use  alloc_workqueue to create queue 
instead of system queue, is alloc_workqueue more efficient than kernel 
thread creation?


Regards,
  Felix



  break;
  }
  }
@@ -123,6 +161,7 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd,

  mutex_unlock(>dev->pcs_data.mutex);
    if (pc_sampling_stop) {
+ 
cancel_work_sync(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work);

kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

index 4eeded4ea5b6..cb93909e6bd3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
@@ -30,5 +30,6 @@
    int kfd_pc_sample(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *args);
+void kfd_pc_sample_handler(struct work_struct *work);
    

Re: [PATCH] drm/amdgpu: Enable GFXOFF During Compute for GFX11

2023-11-23 Thread Hamza Mahfooz

On 11/23/23 12:58, Ori Messinger wrote:

GFXOFF was previously disabled as a temporary workaround for GFX11
due to issues in some compute applications.
This patch re-enables GFXOFF for GFX version 11.


Please describe what has changed since it was disabled, that allows us
to re-enable it without encountering the same issues.



Signed-off-by: Ori Messinger 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 2d22f7d45512..bfd54877b8c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -684,14 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
  void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle)
  {
enum amd_powergating_state state = idle ? AMD_PG_STATE_GATE : 
AMD_PG_STATE_UNGATE;
-   /* Temporary workaround to fix issues observed in some
-* compute applications when GFXOFF is enabled on GFX11.
-*/
-   if (IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 11) {
-   pr_debug("GFXOFF is %s\n", idle ? "enabled" : "disabled");
-   amdgpu_gfx_off_ctrl(adev, idle);
-   } else if ((IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 9) &&
-   (adev->flags & AMD_IS_APU)) {
+   if ((IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 9) &&
+   (adev->flags & AMD_IS_APU)) {
/* Disable GFXOFF and PG. Temporary workaround
 * to fix some compute applications issue on GFX9.
 */

--
Hamza



[PATCH] drm/amdgpu: Enable GFXOFF During Compute for GFX11

2023-11-23 Thread Ori Messinger
GFXOFF was previously disabled as a temporary workaround for GFX11
due to issues in some compute applications.
This patch re-enables GFXOFF for GFX version 11.

Signed-off-by: Ori Messinger 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 2d22f7d45512..bfd54877b8c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -684,14 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
 void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle)
 {
enum amd_powergating_state state = idle ? AMD_PG_STATE_GATE : 
AMD_PG_STATE_UNGATE;
-   /* Temporary workaround to fix issues observed in some
-* compute applications when GFXOFF is enabled on GFX11.
-*/
-   if (IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 11) {
-   pr_debug("GFXOFF is %s\n", idle ? "enabled" : "disabled");
-   amdgpu_gfx_off_ctrl(adev, idle);
-   } else if ((IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 9) &&
-   (adev->flags & AMD_IS_APU)) {
+   if ((IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 9) &&
+   (adev->flags & AMD_IS_APU)) {
/* Disable GFXOFF and PG. Temporary workaround
 * to fix some compute applications issue on GFX9.
 */
-- 
2.25.1



Re: [PATCH] drm/amd/display: fix ABM disablement

2023-11-23 Thread Harry Wentland



On 2023-11-22 17:24, Hamza Mahfooz wrote:
> On recent versions of DMUB firmware, if we want to completely disable
> ABM we have to pass ABM_LEVEL_IMMEDIATE_DISABLE as the requested ABM
> level to DMUB. Otherwise, LCD eDP displays are unable to reach their
> maximum brightness levels. So, to fix this whenever the user requests an
> ABM level of 0 pass ABM_LEVEL_IMMEDIATE_DISABLE to DMUB instead. Also,
> to keep the user's experience consistent map ABM_LEVEL_IMMEDIATE_DISABLE
> to 0 when a user tries to read the requested ABM level.
> 
> Signed-off-by: Hamza Mahfooz 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 5d9496db0ecb..8cb92d941cd9 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -6230,7 +6230,7 @@ int amdgpu_dm_connector_atomic_set_property(struct 
> drm_connector *connector,
>   dm_new_state->underscan_enable = val;
>   ret = 0;
>   } else if (property == adev->mode_info.abm_level_property) {
> - dm_new_state->abm_level = val;
> + dm_new_state->abm_level = val ?: ABM_LEVEL_IMMEDIATE_DISABLE;
>   ret = 0;
>   }
>  
> @@ -6275,7 +6275,8 @@ int amdgpu_dm_connector_atomic_get_property(struct 
> drm_connector *connector,
>   *val = dm_state->underscan_enable;
>   ret = 0;
>   } else if (property == adev->mode_info.abm_level_property) {
> - *val = dm_state->abm_level;
> + *val = (dm_state->abm_level != ABM_LEVEL_IMMEDIATE_DISABLE) ?
> + dm_state->abm_level : 0;
>   ret = 0;
>   }
>  
> @@ -6348,7 +6349,8 @@ void amdgpu_dm_connector_funcs_reset(struct 
> drm_connector *connector)
>   state->pbn = 0;
>  
>   if (connector->connector_type == DRM_MODE_CONNECTOR_eDP)
> - state->abm_level = amdgpu_dm_abm_level;
> + state->abm_level = amdgpu_dm_abm_level ?:
> + ABM_LEVEL_IMMEDIATE_DISABLE;
>  
>   __drm_atomic_helper_connector_reset(connector, >base);
>   }



Re: [PATCH v9 0/4] drm: Add support for atomic async page-flip

2023-11-23 Thread André Almeida

Em 23/11/2023 13:24, Simon Ser escreveu:

Thanks! This iteration of the first 3 patches LGTM, I've pushed them to
drm-misc-next. I've made two adjustments to make checkpatch.pl happy:



Thank you!


- s/uint64_t/u64/
- Fix indentation for a drm_dbg_atomic()


ops :)


Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread James Zhu



On 2023-11-22 17:35, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Add queue remapping to force the waves in any running
processes to complete a CWSR trap.


Please add an explanation why this is needed.


[JZ] Even though the profiling-enabled bits is turned off, the CWSR trap 
handlers for some kernels with this process may still in running stage, 
this will


force the waves in any running processes to complete a CWSR trap, and 
make sure pc sampling is completely stopped with this process.   I will 
add it later.







Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  3 +++
  3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct 
device_queue_manager *dqm)

  return debug_map_and_unlock(dqm);
  }
  +void remap_queue(struct device_queue_manager *dqm,
+    enum kfd_unmap_queues_filter filter,
+    uint32_t filter_param,
+    uint32_t grace_period)


Not sure if you need the filter and grace period parameters in this 
function. What's the point of exposing that to callers who just want 
to trigger a CWSR? You could also change the function name to reflect 
the purpose of the function, rather than the implementation.
[JZ] Just want to create a general function in case that used by others. 
I am fine to remove passing filter_param/grace_period


Regards,
  Felix



+{
+    dqm_lock(dqm);
+    if (!dqm->dev->kfd->shared_resources.enable_mes)
+    execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+    dqm_unlock(dqm);
+}
+
  #if defined(CONFIG_DEBUG_FS)
    static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h

index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct 
device_queue_manager *dqm);

  int debug_map_and_unlock(struct device_queue_manager *dqm);
  int debug_refresh_runlist(struct device_queue_manager *dqm);
  +void remap_queue(struct device_queue_manager *dqm,
+    enum kfd_unmap_queues_filter filter,
+    uint32_t filter_param,
+    uint32_t grace_period);
+
  static inline unsigned int get_sh_mem_bases_32(struct 
kfd_process_device *pdd)

  {
  return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index e8f0559b618e..66670cdb813a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
  #include "kfd_priv.h"
  #include "amdgpu_amdkfd.h"
  #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
    struct supported_pc_sample_info {
  uint32_t ip_version;
@@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd,

cancel_work_sync(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work);
kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+    remap_queue(pdd->dev->dqm,
+    KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);

    mutex_lock(>dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;


Re: [PATCH v9 0/4] drm: Add support for atomic async page-flip

2023-11-23 Thread Simon Ser
Thanks! This iteration of the first 3 patches LGTM, I've pushed them to
drm-misc-next. I've made two adjustments to make checkpatch.pl happy:

- s/uint64_t/u64/
- Fix indentation for a drm_dbg_atomic()


Re: [PATCH 23/24] drm/amdkfd: add pc sampling capability check

2023-11-23 Thread James Zhu



On 2023-11-22 17:40, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:

From: David Yat Sin 

Add pc sampling capability check.


This should be squashed into patch 2. Or if you want to keep it 
separate, put this patch before patch 2 and define 
AMDKFD_IOC_PC_SAMPLE with KFD_IOC_FLAG_PERFMON from the beginning.

[JZ] will do , thanks!


Regards,
  Felix




Signed-off-by: David Yat Sin 
Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 +-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 13 +
  2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index b00390e451bf..5e47e374d824 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -3259,7 +3259,7 @@ static const struct amdkfd_ioctl_desc 
amdkfd_ioctls[] = {

  kfd_ioctl_set_debug_trap, 0),
    AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE,
-    kfd_ioctl_pc_sample, 0),
+    kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON),
  };
    #define AMDKFD_CORE_IOCTL_COUNT    ARRAY_SIZE(amdkfd_ioctls)
@@ -3336,6 +3336,14 @@ static long kfd_ioctl(struct file *filep, 
unsigned int cmd, unsigned long arg)

  }
  }
  +    /* PC Sampling Monitor */
+    if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) {
+    if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) {
+    retcode = -EACCES;
+    goto err_i1;
+    }
+    }
+
  if (cmd & (IOC_IN | IOC_OUT)) {
  if (asize <= sizeof(stack_kdata)) {
  kdata = stack_kdata;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index b7062033fda4..236d3de85153 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -144,6 +144,19 @@ enum kfd_ioctl_flags {
   * we also allow ioctls with SYS_ADMIN capability.
   */
  KFD_IOC_FLAG_CHECKPOINT_RESTORE = BIT(0),
+
+    /*
+ * @KFD_IOC_FLAG_PERFMON:
+ * Performance monitoring feature, GPU performance monitoring 
can allow users
+ * to gather some information about other processes. PC sampling 
can allow
+ * users to infer information about wavefronts from other 
processes that are
+ * running on the same CUs, such as which execution units they 
are using. As
+ * such, this type of performance monitoring should be protected 
and only
+ * available to users with sufficient capabilities: either 
CAP_PERFMON, or,

+ * for backwards compatibility, CAP_SYS_ADMIN.
+ */
+
+    KFD_IOC_FLAG_PERFMON = BIT(1),
  };
  /*
   * Kernel module parameter to specify maximum number of supported 
queues per


Re: [PATCH] drm/amd/display: fix ABM disablement

2023-11-23 Thread Hamza Mahfooz

On 11/22/23 17:24, Hamza Mahfooz wrote:

On recent versions of DMUB firmware, if we want to completely disable
ABM we have to pass ABM_LEVEL_IMMEDIATE_DISABLE as the requested ABM
level to DMUB. Otherwise, LCD eDP displays are unable to reach their
maximum brightness levels. So, to fix this whenever the user requests an
ABM level of 0 pass ABM_LEVEL_IMMEDIATE_DISABLE to DMUB instead. Also,
to keep the user's experience consistent map ABM_LEVEL_IMMEDIATE_DISABLE
to 0 when a user tries to read the requested ABM level.

Signed-off-by: Hamza Mahfooz 


Cc: sta...@vger.kernel.org # 6.1+


---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 5d9496db0ecb..8cb92d941cd9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6230,7 +6230,7 @@ int amdgpu_dm_connector_atomic_set_property(struct 
drm_connector *connector,
dm_new_state->underscan_enable = val;
ret = 0;
} else if (property == adev->mode_info.abm_level_property) {
-   dm_new_state->abm_level = val;
+   dm_new_state->abm_level = val ?: ABM_LEVEL_IMMEDIATE_DISABLE;
ret = 0;
}
  
@@ -6275,7 +6275,8 @@ int amdgpu_dm_connector_atomic_get_property(struct drm_connector *connector,

*val = dm_state->underscan_enable;
ret = 0;
} else if (property == adev->mode_info.abm_level_property) {
-   *val = dm_state->abm_level;
+   *val = (dm_state->abm_level != ABM_LEVEL_IMMEDIATE_DISABLE) ?
+   dm_state->abm_level : 0;
ret = 0;
}
  
@@ -6348,7 +6349,8 @@ void amdgpu_dm_connector_funcs_reset(struct drm_connector *connector)

state->pbn = 0;
  
  		if (connector->connector_type == DRM_MODE_CONNECTOR_eDP)

-   state->abm_level = amdgpu_dm_abm_level;
+   state->abm_level = amdgpu_dm_abm_level ?:
+   ABM_LEVEL_IMMEDIATE_DISABLE;
  
  		__drm_atomic_helper_connector_reset(connector, >base);

}

--
Hamza



Re: [PATCH] drm/amd/display: fix ABM disablement

2023-11-23 Thread Hamza Mahfooz

On 11/22/23 17:45, Pillai, Aurabindo wrote:

[AMD Official Use Only - General]


Does amdgpu_dm_connector_funcs_reset() get called on wakeup from suspend


According to my research/testing drm_connector's rest() callback only
gets called when the connector gets created.

?  Users would want the system to have the same brightness level before 
suspending.



--

Regards,
Jay

*From:* Mahfooz, Hamza 
*Sent:* Wednesday, November 22, 2023 5:24 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Wentland, Harry ; Li, Sun peng (Leo) 
; Siqueira, Rodrigo ; 
Koenig, Christian ; Hung, Alex 
; Pillai, Aurabindo ; Wu, 
Hersen ; Lin, Wayne ; Mahfooz, 
Hamza 

*Subject:* [PATCH] drm/amd/display: fix ABM disablement
On recent versions of DMUB firmware, if we want to completely disable
ABM we have to pass ABM_LEVEL_IMMEDIATE_DISABLE as the requested ABM
level to DMUB. Otherwise, LCD eDP displays are unable to reach their
maximum brightness levels. So, to fix this whenever the user requests an
ABM level of 0 pass ABM_LEVEL_IMMEDIATE_DISABLE to DMUB instead. Also,
to keep the user's experience consistent map ABM_LEVEL_IMMEDIATE_DISABLE
to 0 when a user tries to read the requested ABM level.

Signed-off-by: Hamza Mahfooz 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c

index 5d9496db0ecb..8cb92d941cd9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6230,7 +6230,7 @@ int amdgpu_dm_connector_atomic_set_property(struct 
drm_connector *connector,

  dm_new_state->underscan_enable = val;
  ret = 0;
  } else if (property == adev->mode_info.abm_level_property) {
-   dm_new_state->abm_level = val;
+   dm_new_state->abm_level = val ?: 
ABM_LEVEL_IMMEDIATE_DISABLE;

  ret = 0;
  }

@@ -6275,7 +6275,8 @@ int amdgpu_dm_connector_atomic_get_property(struct 
drm_connector *connector,

  *val = dm_state->underscan_enable;
  ret = 0;
  } else if (property == adev->mode_info.abm_level_property) {
-   *val = dm_state->abm_level;
+   *val = (dm_state->abm_level != 
ABM_LEVEL_IMMEDIATE_DISABLE) ?

+   dm_state->abm_level : 0;
  ret = 0;
  }

@@ -6348,7 +6349,8 @@ void amdgpu_dm_connector_funcs_reset(struct 
drm_connector *connector)

  state->pbn = 0;

  if (connector->connector_type == DRM_MODE_CONNECTOR_eDP)
-   state->abm_level = amdgpu_dm_abm_level;
+   state->abm_level = amdgpu_dm_abm_level ?:
+   ABM_LEVEL_IMMEDIATE_DISABLE;

  __drm_atomic_helper_connector_reset(connector, 
>base);

  }
--
2.42.1


--
Hamza



RE: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table

2023-11-23 Thread Zhang, Yifan
[AMD Official Use Only - General]

Reviewed-by: Yifan Zhang 

-Original Message-
From: Ma, Li 
Sent: Thursday, November 23, 2023 6:07 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian 
; Zhang, Yifan ; Yu, Lang 
; Ma, Li 
Subject: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version and 
metrics table

Increment the driver if version and add new mems to the mertics table.

Signed-off-by: Li Ma 
---
 .../gpu/drm/amd/include/kgd_pp_interface.h| 17 
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 10 +++
 .../inc/pmfw_if/smu14_driver_if_v14_0_0.h | 77 +++
 .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c  | 46 ++-
 4 files changed, 115 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 8ebba87f4289..eaea1c65e526 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -1086,6 +1086,10 @@ struct gpu_metrics_v3_0 {
uint16_taverage_dram_reads;
/* time filtered DRAM write bandwidth [MB/sec] */
uint16_taverage_dram_writes;
+   /* time filtered IPU read bandwidth [MB/sec] */
+   uint16_taverage_ipu_reads;
+   /* time filtered IPU write bandwidth [MB/sec] */
+   uint16_taverage_ipu_writes;

/* Driver attached timestamp (in ns) */
uint64_tsystem_clock_counter;
@@ -1105,6 +1109,8 @@ struct gpu_metrics_v3_0 {
uint32_taverage_all_core_power;
/* calculated core power [mW] */
uint16_taverage_core_power[16];
+   /* time filtered total system power [mW] */
+   uint16_taverage_sys_power;
/* maximum IRM defined STAPM power limit [mW] */
uint16_tstapm_power_limit;
/* time filtered STAPM power limit [mW] */ @@ -1117,6 +1123,8 @@ struct 
gpu_metrics_v3_0 {
uint16_taverage_ipuclk_frequency;
uint16_taverage_fclk_frequency;
uint16_taverage_vclk_frequency;
+   uint16_taverage_uclk_frequency;
+   uint16_taverage_mpipu_frequency;

/* Current clocks */
/* target core frequency [MHz] */
@@ -1126,6 +1134,15 @@ struct gpu_metrics_v3_0 {
/* GFXCLK frequency limit enforced on GFX [MHz] */
uint16_tcurrent_gfx_maxfreq;

+   /* Throttle Residency (ASIC dependent) */
+   uint32_t throttle_residency_prochot;
+   uint32_t throttle_residency_spl;
+   uint32_t throttle_residency_fppt;
+   uint32_t throttle_residency_sppt;
+   uint32_t throttle_residency_thm_core;
+   uint32_t throttle_residency_thm_gfx;
+   uint32_t throttle_residency_thm_soc;
+
/* Metrics table alpha filter time constant [us] */
uint32_ttime_filter_alphavalue;
 };
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index c125253df20b..c2265e027ca8 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1418,6 +1418,16 @@ typedef enum {
METRICS_PCIE_WIDTH,
METRICS_CURR_FANPWM,
METRICS_CURR_SOCKETPOWER,
+   METRICS_AVERAGE_VPECLK,
+   METRICS_AVERAGE_IPUCLK,
+   METRICS_AVERAGE_MPIPUCLK,
+   METRICS_THROTTLER_RESIDENCY_PROCHOT,
+   METRICS_THROTTLER_RESIDENCY_SPL,
+   METRICS_THROTTLER_RESIDENCY_FPPT,
+   METRICS_THROTTLER_RESIDENCY_SPPT,
+   METRICS_THROTTLER_RESIDENCY_THM_CORE,
+   METRICS_THROTTLER_RESIDENCY_THM_GFX,
+   METRICS_THROTTLER_RESIDENCY_THM_SOC,
 } MetricsMember_t;

 enum smu_cmn2asic_mapping_type {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
index 22f88842a7fd..8f42771e1f0a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
@@ -27,7 +27,7 @@
 // *** IMPORTANT ***
 // SMU TEAM: Always increment the interface version if  // any structure is 
changed in this file -#define PMFW_DRIVER_IF_VERSION 6
+#define PMFW_DRIVER_IF_VERSION 7

 typedef struct {
   int32_t value;
@@ -150,37 +150,50 @@ typedef struct {
 } DpmClocks_t;

 typedef struct {
-  uint16_t CoreFrequency[16];//Target core frequency [MHz]
-  uint16_t CorePower[16];//CAC calculated core power [mW]
-  uint16_t CoreTemperature[16];  //TSEN measured core temperature [centi-C]
-  uint16_t GfxTemperature;   //TSEN measured GFX temperature [centi-C]
-  uint16_t SocTemperature;  

[PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table

2023-11-23 Thread Li Ma
Increment the driver if version and add new mems to the mertics table.

Signed-off-by: Li Ma 
---
 .../gpu/drm/amd/include/kgd_pp_interface.h| 17 
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 10 +++
 .../inc/pmfw_if/smu14_driver_if_v14_0_0.h | 77 +++
 .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c  | 46 ++-
 4 files changed, 115 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 8ebba87f4289..eaea1c65e526 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -1086,6 +1086,10 @@ struct gpu_metrics_v3_0 {
uint16_taverage_dram_reads;
/* time filtered DRAM write bandwidth [MB/sec] */
uint16_taverage_dram_writes;
+   /* time filtered IPU read bandwidth [MB/sec] */
+   uint16_taverage_ipu_reads;
+   /* time filtered IPU write bandwidth [MB/sec] */
+   uint16_taverage_ipu_writes;
 
/* Driver attached timestamp (in ns) */
uint64_tsystem_clock_counter;
@@ -1105,6 +1109,8 @@ struct gpu_metrics_v3_0 {
uint32_taverage_all_core_power;
/* calculated core power [mW] */
uint16_taverage_core_power[16];
+   /* time filtered total system power [mW] */
+   uint16_taverage_sys_power;
/* maximum IRM defined STAPM power limit [mW] */
uint16_tstapm_power_limit;
/* time filtered STAPM power limit [mW] */
@@ -1117,6 +1123,8 @@ struct gpu_metrics_v3_0 {
uint16_taverage_ipuclk_frequency;
uint16_taverage_fclk_frequency;
uint16_taverage_vclk_frequency;
+   uint16_taverage_uclk_frequency;
+   uint16_taverage_mpipu_frequency;
 
/* Current clocks */
/* target core frequency [MHz] */
@@ -1126,6 +1134,15 @@ struct gpu_metrics_v3_0 {
/* GFXCLK frequency limit enforced on GFX [MHz] */
uint16_tcurrent_gfx_maxfreq;
 
+   /* Throttle Residency (ASIC dependent) */
+   uint32_t throttle_residency_prochot;
+   uint32_t throttle_residency_spl;
+   uint32_t throttle_residency_fppt;
+   uint32_t throttle_residency_sppt;
+   uint32_t throttle_residency_thm_core;
+   uint32_t throttle_residency_thm_gfx;
+   uint32_t throttle_residency_thm_soc;
+
/* Metrics table alpha filter time constant [us] */
uint32_ttime_filter_alphavalue;
 };
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index c125253df20b..c2265e027ca8 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1418,6 +1418,16 @@ typedef enum {
METRICS_PCIE_WIDTH,
METRICS_CURR_FANPWM,
METRICS_CURR_SOCKETPOWER,
+   METRICS_AVERAGE_VPECLK,
+   METRICS_AVERAGE_IPUCLK,
+   METRICS_AVERAGE_MPIPUCLK,
+   METRICS_THROTTLER_RESIDENCY_PROCHOT,
+   METRICS_THROTTLER_RESIDENCY_SPL,
+   METRICS_THROTTLER_RESIDENCY_FPPT,
+   METRICS_THROTTLER_RESIDENCY_SPPT,
+   METRICS_THROTTLER_RESIDENCY_THM_CORE,
+   METRICS_THROTTLER_RESIDENCY_THM_GFX,
+   METRICS_THROTTLER_RESIDENCY_THM_SOC,
 } MetricsMember_t;
 
 enum smu_cmn2asic_mapping_type {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
index 22f88842a7fd..8f42771e1f0a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
@@ -27,7 +27,7 @@
 // *** IMPORTANT ***
 // SMU TEAM: Always increment the interface version if
 // any structure is changed in this file
-#define PMFW_DRIVER_IF_VERSION 6
+#define PMFW_DRIVER_IF_VERSION 7
 
 typedef struct {
   int32_t value;
@@ -150,37 +150,50 @@ typedef struct {
 } DpmClocks_t;
 
 typedef struct {
-  uint16_t CoreFrequency[16];//Target core frequency [MHz]
-  uint16_t CorePower[16];//CAC calculated core power [mW]
-  uint16_t CoreTemperature[16];  //TSEN measured core temperature [centi-C]
-  uint16_t GfxTemperature;   //TSEN measured GFX temperature [centi-C]
-  uint16_t SocTemperature;   //TSEN measured SOC temperature [centi-C]
-  uint16_t StapmOpnLimit;//Maximum IRM defined STAPM power limit 
[mW]
-  uint16_t StapmCurrentLimit;//Time filtered STAPM power limit [mW]
-  uint16_t InfrastructureCpuMaxFreq; //CCLK frequency limit enforced on 
classic cores [MHz]
-  uint16_t InfrastructureGfxMaxFreq; 

[PATCH] drm/amdgpu: Skip access gfx11 golden registers under SRIOV

2023-11-23 Thread ZhenGuo Yin
[Why]
Golden registers are PF-only registers on gfx11.
RLCG interface will return "out-of-range" under SRIOV VF.

[How]
Skip access gfx11 golden registers under SRIOV.

Signed-off-by: ZhenGuo Yin 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 8ed4a6fb147a..288e926e5cd4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -293,6 +293,9 @@ static void gfx_v11_0_set_kiq_pm4_funcs(struct 
amdgpu_device *adev)
 
 static void gfx_v11_0_init_golden_registers(struct amdgpu_device *adev)
 {
+   if (amdgpu_sriov_vf(adev))
+   return;
+
switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
-- 
2.35.1



Re: [PATCH v4 00/20] remove I2C_CLASS_DDC support

2023-11-23 Thread Thomas Zimmermann

Hi

Am 23.11.23 um 09:34 schrieb Heiner Kallweit:

On 23.11.2023 09:19, Thomas Zimmermann wrote:

Hi

Am 23.11.23 um 08:16 schrieb Heiner Kallweit:

On 23.11.2023 07:56, Thomas Zimmermann wrote:

Hi

Am 20.11.23 um 22:46 schrieb Heiner Kallweit:

After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.

Preferably this series should be applied via the i2c tree.

v2:
- change tag in commit subject of patch 03
- add ack tags
v3:
- fix a compile error in patch 5
v4:
- more ack and review tags

Signed-off-by: Heiner Kallweit 


Acked-by: Thomas Zimmermann 

for the patches that don't already have my r-b.


This refers to which patches of the series?
Patches 8, 16, 18 are the remaining ones w/o A-b or R-b.


I've looked through the patchset. Feel free to add my a-b to patches 1 to 19; 
except for 2 and 17, which already have my r-b.

BTW I only received 19 patches. is there a patch 20/20?


Yes, see here:
https://patchwork.ozlabs.org/project/linux-i2c/patch/20231120214624.9378-21-hkallwe...@gmail.com/
If you're subscribed to linux-i2c or linux-kernel list you should have received 
it.


I see, I'm not on these lists. I don't have the authority to ack that 
final patch, but let me know if you want to merge anything through the 
DRM trees.


Best regards
Thomas




Best regards
Thomas




Best regards
Thomas


Thanks, Heiner



---

    drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c   |    1 -
    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |    1 -
    drivers/gpu/drm/ast/ast_i2c.c |    1 -
    drivers/gpu/drm/bridge/synopsys/dw-hdmi.c |    1 -
    drivers/gpu/drm/display/drm_dp_helper.c   |    1 -
    drivers/gpu/drm/display/drm_dp_mst_topology.c |    1 -
    drivers/gpu/drm/gma500/cdv_intel_dp.c |    1 -
    drivers/gpu/drm/gma500/intel_gmbus.c  |    1 -
    drivers/gpu/drm/gma500/oaktrail_hdmi_i2c.c    |    1 -
    drivers/gpu/drm/gma500/psb_intel_sdvo.c   |    1 -
    drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_i2c.c   |    1 -
    drivers/gpu/drm/i915/display/intel_gmbus.c    |    1 -
    drivers/gpu/drm/i915/display/intel_sdvo.c |    1 -
    drivers/gpu/drm/loongson/lsdc_i2c.c   |    1 -
    drivers/gpu/drm/mediatek/mtk_hdmi_ddc.c   |    1 -
    drivers/gpu/drm/mgag200/mgag200_i2c.c |    1 -
    drivers/gpu/drm/msm/hdmi/hdmi_i2c.c   |    1 -
    drivers/gpu/drm/radeon/radeon_i2c.c   |    1 -
    drivers/gpu/drm/rockchip/inno_hdmi.c  |    1 -
    drivers/gpu/drm/rockchip/rk3066_hdmi.c    |    1 -
    drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c    |    1 -
    drivers/video/fbdev/core/fb_ddc.c |    1 -
    drivers/video/fbdev/cyber2000fb.c |    1 -
    drivers/video/fbdev/i740fb.c  |    1 -
    drivers/video/fbdev/intelfb/intelfb_i2c.c |   15 +--
    drivers/video/fbdev/matrox/i2c-matroxfb.c |   12 
    drivers/video/fbdev/s3fb.c    |    1 -
    drivers/video/fbdev/tdfxfb.c  |    1 -
    drivers/video/fbdev/tridentfb.c   |    1 -
    drivers/video/fbdev/via/via_i2c.c |    1 -
    include/linux/i2c.h   |    1 -
    31 files changed, 9 insertions(+), 47 deletions(-)










--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 00/20] remove I2C_CLASS_DDC support

2023-11-23 Thread Heiner Kallweit
On 23.11.2023 09:19, Thomas Zimmermann wrote:
> Hi
> 
> Am 23.11.23 um 08:16 schrieb Heiner Kallweit:
>> On 23.11.2023 07:56, Thomas Zimmermann wrote:
>>> Hi
>>>
>>> Am 20.11.23 um 22:46 schrieb Heiner Kallweit:
 After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
 olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
 Class-based device auto-detection is a legacy mechanism and shouldn't
 be used in new code. So we can remove this class completely now.

 Preferably this series should be applied via the i2c tree.

 v2:
 - change tag in commit subject of patch 03
 - add ack tags
 v3:
 - fix a compile error in patch 5
 v4:
 - more ack and review tags

 Signed-off-by: Heiner Kallweit 
>>>
>>> Acked-by: Thomas Zimmermann 
>>>
>>> for the patches that don't already have my r-b.
>>>
>> This refers to which patches of the series?
>> Patches 8, 16, 18 are the remaining ones w/o A-b or R-b.
> 
> I've looked through the patchset. Feel free to add my a-b to patches 1 to 19; 
> except for 2 and 17, which already have my r-b.
> 
> BTW I only received 19 patches. is there a patch 20/20?
> 
Yes, see here:
https://patchwork.ozlabs.org/project/linux-i2c/patch/20231120214624.9378-21-hkallwe...@gmail.com/
If you're subscribed to linux-i2c or linux-kernel list you should have received 
it.

> Best regards
> Thomas
> 
>>
>>> Best regards
>>> Thomas
>>>
>> Thanks, Heiner
>>

 ---

    drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c   |    1 -
    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |    1 -
    drivers/gpu/drm/ast/ast_i2c.c |    1 -
    drivers/gpu/drm/bridge/synopsys/dw-hdmi.c |    1 -
    drivers/gpu/drm/display/drm_dp_helper.c   |    1 -
    drivers/gpu/drm/display/drm_dp_mst_topology.c |    1 -
    drivers/gpu/drm/gma500/cdv_intel_dp.c |    1 -
    drivers/gpu/drm/gma500/intel_gmbus.c  |    1 -
    drivers/gpu/drm/gma500/oaktrail_hdmi_i2c.c    |    1 -
    drivers/gpu/drm/gma500/psb_intel_sdvo.c   |    1 -
    drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_i2c.c   |    1 -
    drivers/gpu/drm/i915/display/intel_gmbus.c    |    1 -
    drivers/gpu/drm/i915/display/intel_sdvo.c |    1 -
    drivers/gpu/drm/loongson/lsdc_i2c.c   |    1 -
    drivers/gpu/drm/mediatek/mtk_hdmi_ddc.c   |    1 -
    drivers/gpu/drm/mgag200/mgag200_i2c.c |    1 -
    drivers/gpu/drm/msm/hdmi/hdmi_i2c.c   |    1 -
    drivers/gpu/drm/radeon/radeon_i2c.c   |    1 -
    drivers/gpu/drm/rockchip/inno_hdmi.c  |    1 -
    drivers/gpu/drm/rockchip/rk3066_hdmi.c    |    1 -
    drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c    |    1 -
    drivers/video/fbdev/core/fb_ddc.c |    1 -
    drivers/video/fbdev/cyber2000fb.c |    1 -
    drivers/video/fbdev/i740fb.c  |    1 -
    drivers/video/fbdev/intelfb/intelfb_i2c.c |   15 +--
    drivers/video/fbdev/matrox/i2c-matroxfb.c |   12 
    drivers/video/fbdev/s3fb.c    |    1 -
    drivers/video/fbdev/tdfxfb.c  |    1 -
    drivers/video/fbdev/tridentfb.c   |    1 -
    drivers/video/fbdev/via/via_i2c.c |    1 -
    include/linux/i2c.h   |    1 -
    31 files changed, 9 insertions(+), 47 deletions(-)
>>>
>>
> 



Re: [PATCH v4 00/20] remove I2C_CLASS_DDC support

2023-11-23 Thread Thomas Zimmermann

Hi

Am 23.11.23 um 08:16 schrieb Heiner Kallweit:

On 23.11.2023 07:56, Thomas Zimmermann wrote:

Hi

Am 20.11.23 um 22:46 schrieb Heiner Kallweit:

After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.

Preferably this series should be applied via the i2c tree.

v2:
- change tag in commit subject of patch 03
- add ack tags
v3:
- fix a compile error in patch 5
v4:
- more ack and review tags

Signed-off-by: Heiner Kallweit 


Acked-by: Thomas Zimmermann 

for the patches that don't already have my r-b.


This refers to which patches of the series?
Patches 8, 16, 18 are the remaining ones w/o A-b or R-b.


I've looked through the patchset. Feel free to add my a-b to patches 1 
to 19; except for 2 and 17, which already have my r-b.


BTW I only received 19 patches. is there a patch 20/20?

Best regards
Thomas




Best regards
Thomas


Thanks, Heiner



---

   drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c   |    1 -
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |    1 -
   drivers/gpu/drm/ast/ast_i2c.c |    1 -
   drivers/gpu/drm/bridge/synopsys/dw-hdmi.c |    1 -
   drivers/gpu/drm/display/drm_dp_helper.c   |    1 -
   drivers/gpu/drm/display/drm_dp_mst_topology.c |    1 -
   drivers/gpu/drm/gma500/cdv_intel_dp.c |    1 -
   drivers/gpu/drm/gma500/intel_gmbus.c  |    1 -
   drivers/gpu/drm/gma500/oaktrail_hdmi_i2c.c    |    1 -
   drivers/gpu/drm/gma500/psb_intel_sdvo.c   |    1 -
   drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_i2c.c   |    1 -
   drivers/gpu/drm/i915/display/intel_gmbus.c    |    1 -
   drivers/gpu/drm/i915/display/intel_sdvo.c |    1 -
   drivers/gpu/drm/loongson/lsdc_i2c.c   |    1 -
   drivers/gpu/drm/mediatek/mtk_hdmi_ddc.c   |    1 -
   drivers/gpu/drm/mgag200/mgag200_i2c.c |    1 -
   drivers/gpu/drm/msm/hdmi/hdmi_i2c.c   |    1 -
   drivers/gpu/drm/radeon/radeon_i2c.c   |    1 -
   drivers/gpu/drm/rockchip/inno_hdmi.c  |    1 -
   drivers/gpu/drm/rockchip/rk3066_hdmi.c    |    1 -
   drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c    |    1 -
   drivers/video/fbdev/core/fb_ddc.c |    1 -
   drivers/video/fbdev/cyber2000fb.c |    1 -
   drivers/video/fbdev/i740fb.c  |    1 -
   drivers/video/fbdev/intelfb/intelfb_i2c.c |   15 +--
   drivers/video/fbdev/matrox/i2c-matroxfb.c |   12 
   drivers/video/fbdev/s3fb.c    |    1 -
   drivers/video/fbdev/tdfxfb.c  |    1 -
   drivers/video/fbdev/tridentfb.c   |    1 -
   drivers/video/fbdev/via/via_i2c.c |    1 -
   include/linux/i2c.h   |    1 -
   31 files changed, 9 insertions(+), 47 deletions(-)






--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer

2023-11-23 Thread Christian König

Am 23.11.23 um 02:22 schrieb Lu Yao:

For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init
'didt_rreg' and 'didt_wreg' to 'NULL'. But in func
'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT'
lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use
these two definitions won't be added for 'AMDGPU_FAMILY_SI'.

So, add null pointer judgment before calling.

Signed-off-by: Lu Yao 


Reviewed-by: Christian König 


---
Changes in v2:
   1. Drop dev_err message.
   2. Change error code from 'EPERM' to 'EOPNOTSUPP'
Link to v1: https://lore.kernel.org/all/20231122093509.34302-1-ya...@kylinos.cn/
Thanks Christian for his comments.
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a53f436fa9f1..e098cd66fa2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -638,6 +638,9 @@ static ssize_t amdgpu_debugfs_regs_didt_read(struct file 
*f, char __user *buf,
if (size & 0x3 || *pos & 0x3)
return -EINVAL;
  
+	if (adev->didt_rreg == NULL)

+   return -EOPNOTSUPP;
+
r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
if (r < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
@@ -694,6 +697,9 @@ static ssize_t amdgpu_debugfs_regs_didt_write(struct file 
*f, const char __user
if (size & 0x3 || *pos & 0x3)
return -EINVAL;
  
+	if (adev->didt_wreg == NULL)

+   return -EOPNOTSUPP;
+
r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
if (r < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);




[PATCH v2] drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer

2023-11-23 Thread Lu Yao
For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init
'didt_rreg' and 'didt_wreg' to 'NULL'. But in func
'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT'
lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use
these two definitions won't be added for 'AMDGPU_FAMILY_SI'.

So, add null pointer judgment before calling.

Signed-off-by: Lu Yao 
---
Changes in v2:
  1. Drop dev_err message.
  2. Change error code from 'EPERM' to 'EOPNOTSUPP'
Link to v1: https://lore.kernel.org/all/20231122093509.34302-1-ya...@kylinos.cn/
Thanks Christian for his comments.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a53f436fa9f1..e098cd66fa2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -638,6 +638,9 @@ static ssize_t amdgpu_debugfs_regs_didt_read(struct file 
*f, char __user *buf,
if (size & 0x3 || *pos & 0x3)
return -EINVAL;
 
+   if (adev->didt_rreg == NULL)
+   return -EOPNOTSUPP;
+
r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
if (r < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
@@ -694,6 +697,9 @@ static ssize_t amdgpu_debugfs_regs_didt_write(struct file 
*f, const char __user
if (size & 0x3 || *pos & 0x3)
return -EINVAL;
 
+   if (adev->didt_wreg == NULL)
+   return -EOPNOTSUPP;
+
r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
if (r < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
-- 
2.25.1



[PATCH] drm/radeon: Prevent multiple error lines on suspend

2023-11-23 Thread Woody Suwalski
# Fix to avoid multiple error lines printed on every suspend by Radeon 
driver's debugfs.

#
# radeon_debugfs_init() calls debugfs_create_file() for every ring.
#
# This results in printing multiple error lines to the screen and dmesg 
similar to this:
# debugfs: File 'radeon_ring_vce2' in directory ':00:01.0' already 
present!

#
# The fix is to run lookup for the file before trying to (re)create that 
debug file.


# Signed-off-by: Woody Suwalski 

diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c

index e6534fa9f1fb..72b1d2d31295 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -549,10 +549,15 @@ static void radeon_debugfs_ring_init(struct 
radeon_device *rdev, struct radeon_r

 #if defined(CONFIG_DEBUG_FS)
     const char *ring_name = radeon_debugfs_ring_idx_to_name(ring->idx);
     struct dentry *root = rdev->ddev->primary->debugfs_root;
-
-    if (ring_name)
-        debugfs_create_file(ring_name, 0444, root, ring,
-                _debugfs_ring_info_fops);
+    struct dentry *lookup;
+
+    if (ring_name) {
+        if ((lookup = debugfs_lookup(ring_name, root)) == NULL)
+            debugfs_create_file(ring_name, 0444, root, ring,
+                    _debugfs_ring_info_fops);
+        else
+            dput(lookup);
+    }

 #endif
 }

[0.00] Linux version 6.7-pingu (kernel@pingu64builder) (gcc (Debian 
12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #0~rc2 SMP 
PREEMPT_DYNAMIC Sun Nov 19 22:15:18 EST 2023
[0.00] Command line: initrd=\EFI\Pingu\boot\init-6.7-pingu.img quiet 
net.ifnames=0 loop.max_part=7 root=/dev/sda7 resume=swap:/dev/sda6 
log_buf_len=128k SHOWSPLASH=y splash systemd.show-status=0
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0006dfff] usable
[0.00] BIOS-e820: [mem 0x0006e000-0x0006] ACPI NVS
[0.00] BIOS-e820: [mem 0x0007-0x00085fff] usable
[0.00] BIOS-e820: [mem 0x00086000-0x000b] reserved
[0.00] BIOS-e820: [mem 0x0010-0x93aecfff] usable
[0.00] BIOS-e820: [mem 0x93aed000-0x942edfff] reserved
[0.00] BIOS-e820: [mem 0x942ee000-0x9e17efff] usable
[0.00] BIOS-e820: [mem 0x9e17f000-0x9f97efff] reserved
[0.00] BIOS-e820: [mem 0x9f97f000-0x9fb7efff] ACPI NVS
[0.00] BIOS-e820: [mem 0x9fb7f000-0x9fbfefff] ACPI data
[0.00] BIOS-e820: [mem 0x9fbff000-0x9fbf] usable
[0.00] BIOS-e820: [mem 0x9fc0-0xdfff] reserved
[0.00] BIOS-e820: [mem 0xf090-0xf09f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec01fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed80fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff80-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00021eff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] APIC: Static calls initialized
[0.00] e820: update [mem 0x921db018-0x921e9c57] usable ==> usable
[0.00] e820: update [mem 0x921db018-0x921e9c57] usable ==> usable
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0006dfff] 
usable
[0.00] reserve setup_data: [mem 0x0006e000-0x0006] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x0007-0x00085fff] 
usable
[0.00] reserve setup_data: [mem 0x00086000-0x000b] 
reserved
[0.00] reserve setup_data: [mem 0x0010-0x921db017] 
usable
[0.00] reserve setup_data: [mem 0x921db018-0x921e9c57] 
usable
[0.00] reserve setup_data: [mem 0x921e9c58-0x93aecfff] 
usable
[0.00] reserve setup_data: [mem 0x93aed000-0x942edfff] 
reserved
[0.00] reserve setup_data: [mem 0x942ee000-0x9e17efff] 
usable
[0.00] reserve setup_data: [mem 0x9e17f000-0x9f97efff] 
reserved
[0.00] reserve setup_data: [mem 0x9f97f000-0x9fb7efff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x9fb7f000-0x9fbfefff] 
ACPI data
[0.00] reserve setup_data: [mem 0x9fbff000-0x9fbf] 
usable
[0.00] reserve setup_data: [mem 0x9fc0-0xdfff] 
reserved
[0.00] reserve setup_data: [mem 0xf090-0xf09f] 
reserved
[

[PATCH] drm/amd/pm: fix a memleak in aldebaran_tables_init

2023-11-23 Thread Dinghao Liu
When kzalloc() for smu_table->ecc_table fails, we should free
the previously allocated resources to prevent memleak.

Fixes: edd794208555 ("drm/amd/pm: add message smu to get ecc_table v2")
Signed-off-by: Dinghao Liu 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 1a6675d70a4b..f1440869d1ce 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -257,8 +257,11 @@ static int aldebaran_tables_init(struct smu_context *smu)
}
 
smu_table->ecc_table = kzalloc(tables[SMU_TABLE_ECCINFO].size, 
GFP_KERNEL);
-   if (!smu_table->ecc_table)
+   if (!smu_table->ecc_table) {
+   kfree(smu_table->metrics_table);
+   kfree(smu_table->gpu_metrics_table);
return -ENOMEM;
+   }
 
return 0;
 }
-- 
2.17.1



Re: [PATCH v4 00/20] remove I2C_CLASS_DDC support

2023-11-23 Thread Heiner Kallweit
On 23.11.2023 07:56, Thomas Zimmermann wrote:
> Hi
> 
> Am 20.11.23 um 22:46 schrieb Heiner Kallweit:
>> After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
>> olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
>> Class-based device auto-detection is a legacy mechanism and shouldn't
>> be used in new code. So we can remove this class completely now.
>>
>> Preferably this series should be applied via the i2c tree.
>>
>> v2:
>> - change tag in commit subject of patch 03
>> - add ack tags
>> v3:
>> - fix a compile error in patch 5
>> v4:
>> - more ack and review tags
>>
>> Signed-off-by: Heiner Kallweit 
> 
> Acked-by: Thomas Zimmermann 
> 
> for the patches that don't already have my r-b.
> 
This refers to which patches of the series?
Patches 8, 16, 18 are the remaining ones w/o A-b or R-b.

> Best regards
> Thomas
> 
Thanks, Heiner

>>
>> ---
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c   |    1 -
>>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |    1 -
>>   drivers/gpu/drm/ast/ast_i2c.c |    1 -
>>   drivers/gpu/drm/bridge/synopsys/dw-hdmi.c |    1 -
>>   drivers/gpu/drm/display/drm_dp_helper.c   |    1 -
>>   drivers/gpu/drm/display/drm_dp_mst_topology.c |    1 -
>>   drivers/gpu/drm/gma500/cdv_intel_dp.c |    1 -
>>   drivers/gpu/drm/gma500/intel_gmbus.c  |    1 -
>>   drivers/gpu/drm/gma500/oaktrail_hdmi_i2c.c    |    1 -
>>   drivers/gpu/drm/gma500/psb_intel_sdvo.c   |    1 -
>>   drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_i2c.c   |    1 -
>>   drivers/gpu/drm/i915/display/intel_gmbus.c    |    1 -
>>   drivers/gpu/drm/i915/display/intel_sdvo.c |    1 -
>>   drivers/gpu/drm/loongson/lsdc_i2c.c   |    1 -
>>   drivers/gpu/drm/mediatek/mtk_hdmi_ddc.c   |    1 -
>>   drivers/gpu/drm/mgag200/mgag200_i2c.c |    1 -
>>   drivers/gpu/drm/msm/hdmi/hdmi_i2c.c   |    1 -
>>   drivers/gpu/drm/radeon/radeon_i2c.c   |    1 -
>>   drivers/gpu/drm/rockchip/inno_hdmi.c  |    1 -
>>   drivers/gpu/drm/rockchip/rk3066_hdmi.c    |    1 -
>>   drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c    |    1 -
>>   drivers/video/fbdev/core/fb_ddc.c |    1 -
>>   drivers/video/fbdev/cyber2000fb.c |    1 -
>>   drivers/video/fbdev/i740fb.c  |    1 -
>>   drivers/video/fbdev/intelfb/intelfb_i2c.c |   15 +--
>>   drivers/video/fbdev/matrox/i2c-matroxfb.c |   12 
>>   drivers/video/fbdev/s3fb.c    |    1 -
>>   drivers/video/fbdev/tdfxfb.c  |    1 -
>>   drivers/video/fbdev/tridentfb.c   |    1 -
>>   drivers/video/fbdev/via/via_i2c.c |    1 -
>>   include/linux/i2c.h   |    1 -
>>   31 files changed, 9 insertions(+), 47 deletions(-)
>