Re: [PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable

2023-03-23 Thread Felix Kuehling
Sorry, I think that was just a stray comment that I messed up while 
editing my response. You can ignore it.


Regards,
  Felix


Am 2023-03-23 um 15:12 schrieb Kim, Jonathan:

index c06ada0844ba..a2ac98d06e71 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -979,6 +979,14 @@ static int evict_process_queues_cpsch(struct

device_queue_manager *dqm,

 goto out;

 pdd = qpd_to_pdd(qpd);
+
+   /* The debugger creates processes that temporarily have not

acquired

+* all VMs for all devices and has no VMs itself.
+* Skip queue eviction on process eviction.
+*/
+   if (!pdd->drm_priv)
+   goto out;
+

This should be before qpd->

Sorry I didn't quite catch what you were saying here (did your comment get 
cutoff?).
Did you mean the pdd->drm_priv check needs to go before the if (qpd->evicted++ 
> 0) /* already evicted, do nothing */ check?

Thanks,

Jon



RE: [PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable

2023-03-23 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Thursday, February 16, 2023 6:44 PM
> To: Kim, Jonathan ; amd-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> Subject: Re: [PATCH 03/32] drm/amdkfd: prepare per-process debug enable
> and disable
>
>
> On 2023-01-25 14:53, Jonathan Kim wrote:
> > The ROCm debugger will attach to a process to debug by PTRACE and will
> > expect the KFD to prepare a process for the target PID, whether the
> > target PID has opened the KFD device or not.
> >
> > This patch is to explicity handle this requirement.  Further HW mode
> > setting and runtime coordination requirements will be handled in
> > following patches.
> >
> > In the case where the target process has not opened the KFD device,
> > a new KFD process must be created for the target PID.
> > The debugger as well as the target process for this case will have not
> > acquired any VMs so handle process restoration to correctly account for
> > this.
> >
> > To coordinate with HSA runtime, the debugger must be aware of the target
> > process' runtime enablement status and will copy the runtime status
> > information into the debugged KFD process for later query.
> >
> > On enablement, the debugger will subscribe to a set of exceptions where
> > each exception events will notify the debugger through a pollable FIFO
> > file descriptor that the debugger provides to the KFD to manage.
> > Some events will be synchronously raised while other are scheduled,
> > which is why a debug_event_workarea worker is initialized.
> >
> > Finally on process termination of either the debugger or the target,
> > debugging must be disabled if it has not been done so.
> >
> > v3: fix typo on debug trap disable and PTRACE ATTACH relax check.
> > remove unnecessary queue eviction counter reset when there's nothing
> > to evict.
> > change err code to EALREADY if attaching to an already attached process.
> > move debug disable to release worker to avoid race with disable from
> > ioctl call.
> >
> > v2: relax debug trap disable and PTRACE ATTACH requirement.
> >
> > Signed-off-by: Jonathan Kim
> > ---
> >   drivers/gpu/drm/amd/amdkfd/Makefile   |  3 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 88 -
> >   drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 94
> +++
> >   drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 33 +++
> >   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 -
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 34 ++-
> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 63 +
> >   7 files changed, 308 insertions(+), 29 deletions(-)
> >   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> >   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile
> b/drivers/gpu/drm/amd/amdkfd/Makefile
> > index e758c2a24cd0..747754428073 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/Makefile
> > +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
> > @@ -55,7 +55,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
> > $(AMDKFD_PATH)/kfd_int_process_v9.o \
> > $(AMDKFD_PATH)/kfd_int_process_v11.o \
> > $(AMDKFD_PATH)/kfd_smi_events.o \
> > -   $(AMDKFD_PATH)/kfd_crat.o
> > +   $(AMDKFD_PATH)/kfd_crat.o \
> > +   $(AMDKFD_PATH)/kfd_debug.o
> >
> >   ifneq ($(CONFIG_AMD_IOMMU_V2),)
> >   AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > index d3b019e64093..ee05c2e54ef6 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > @@ -44,6 +44,7 @@
> >   #include "amdgpu_amdkfd.h"
> >   #include "kfd_smi_events.h"
> >   #include "amdgpu_dma_buf.h"
> > +#include "kfd_debug.h"
> >
> >   static long kfd_ioctl(struct file *, unsigned int, unsigned long);
> >   static int kfd_open(struct inode *, struct file *);
> > @@ -142,10 +143,15 @@ static int kfd_open(struct inode *inode, struct
> file *filep)
> > return -EPERM;
> > }
> >
> > -   process = kfd_create_process(filep);
> > +   process = kfd_create_process(current);
> > if (IS_ERR(process))
> > return PTR_ERR(process);
> >
> > +   if (kfd_process_init_cwsr_apu(process, filep)) {
> > 

Re: [PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable

2023-02-16 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

The ROCm debugger will attach to a process to debug by PTRACE and will
expect the KFD to prepare a process for the target PID, whether the
target PID has opened the KFD device or not.

This patch is to explicity handle this requirement.  Further HW mode
setting and runtime coordination requirements will be handled in
following patches.

In the case where the target process has not opened the KFD device,
a new KFD process must be created for the target PID.
The debugger as well as the target process for this case will have not
acquired any VMs so handle process restoration to correctly account for
this.

To coordinate with HSA runtime, the debugger must be aware of the target
process' runtime enablement status and will copy the runtime status
information into the debugged KFD process for later query.

On enablement, the debugger will subscribe to a set of exceptions where
each exception events will notify the debugger through a pollable FIFO
file descriptor that the debugger provides to the KFD to manage.
Some events will be synchronously raised while other are scheduled,
which is why a debug_event_workarea worker is initialized.

Finally on process termination of either the debugger or the target,
debugging must be disabled if it has not been done so.

v3: fix typo on debug trap disable and PTRACE ATTACH relax check.
remove unnecessary queue eviction counter reset when there's nothing
to evict.
change err code to EALREADY if attaching to an already attached process.
move debug disable to release worker to avoid race with disable from
ioctl call.

v2: relax debug trap disable and PTRACE ATTACH requirement.

Signed-off-by: Jonathan Kim
---
  drivers/gpu/drm/amd/amdkfd/Makefile   |  3 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 88 -
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 94 +++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 33 +++
  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 -
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 34 ++-
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 63 +
  7 files changed, 308 insertions(+), 29 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.c
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index e758c2a24cd0..747754428073 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -55,7 +55,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v9.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
-   $(AMDKFD_PATH)/kfd_crat.o
+   $(AMDKFD_PATH)/kfd_crat.o \
+   $(AMDKFD_PATH)/kfd_debug.o
  
  ifneq ($(CONFIG_AMD_IOMMU_V2),)

  AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d3b019e64093..ee05c2e54ef6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -44,6 +44,7 @@
  #include "amdgpu_amdkfd.h"
  #include "kfd_smi_events.h"
  #include "amdgpu_dma_buf.h"
+#include "kfd_debug.h"
  
  static long kfd_ioctl(struct file *, unsigned int, unsigned long);

  static int kfd_open(struct inode *, struct file *);
@@ -142,10 +143,15 @@ static int kfd_open(struct inode *inode, struct file 
*filep)
return -EPERM;
}
  
-	process = kfd_create_process(filep);

+   process = kfd_create_process(current);
if (IS_ERR(process))
return PTR_ERR(process);
  
+	if (kfd_process_init_cwsr_apu(process, filep)) {

+   kfd_unref_process(process);
+   return -EFAULT;
+   }
+
if (kfd_is_locked()) {
dev_dbg(kfd_device, "kfd is locked!\n"
"process %d unreferenced", process->pasid);
@@ -2653,6 +2659,9 @@ static int kfd_ioctl_runtime_enable(struct file *filep, 
struct kfd_process *p, v
  static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process 
*p, void *data)
  {
struct kfd_ioctl_dbg_trap_args *args = data;
+   struct task_struct *thread = NULL;
+   struct pid *pid = NULL;
+   struct kfd_process *target = NULL;
int r = 0;
  
  	if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {

@@ -2660,9 +2669,71 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
return -EINVAL;
}
  
+	pid = find_get_pid(args->pid);

+   if (!pid) {
+   pr_debug("Cannot find pid info for %i\n", args->pid);
+   r = -ESRCH;
+   goto out;
+   }
+
+   thread = get_pid_task(pid, PIDTYPE_PID);
+
+   if (args->op == KFD_IOC_DBG_TRAP_ENABLE) {
+   bool create_process;
+
+